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PREFACE 

This  volume  comprises  the  Proceedings  of  the  Symposium  on 
Mathematical  Pattern  Recognition  and  Image  Analysis  (MPRIA)  held  June  6-8, 
1984,  at  the  NASA/Johnson  Space  Center,  Houston,  Texas. 

The  Symposium  was  initiated  with  a brief  Program  Overview  presented 
by  Drs.  M.  Kristine  Butera,  NASA  Headquarters,  and  R,  P.  Heydorn,  NASA/JSC. 

The  sixteen  papers  of  the  Proceedings  reflect  the  results  of  various 
research  efforts  initiated  during  FY  1983  as  part  of  NASA's  Remote 
Sensing  Research  Program.  Six  of  the  papers  prsent  results  from  the  four 
research  efforts  carried  out  by  the  following  NASA  principal  investigators: 
R.  P.  Heydorn  - NASA/Johnson  Space  Center 
A.  G.  Houston  - NASA/Johnson  Space  Center 
David  D.  Dow  - National  Space  Technology  Laboratories 
Meemong  Lee  - Jet  Propulsion  Laboratory 
The  remaining  papers  present  second-year  results  from  ten  of  the  eleven 
research  efforts  initiated  July  16,  1982,  under  Contract  NAS  9-16664  and 
carried  out  by  the  following  principal  investigators: 

H.  P.  Decell,  Jr./B.  C.  Peters,  Jr.  - University  of  Houston 

Carl  Morris  - University  of  Texas  at  Austin 

L.  Schumaker/L.  F.  Guseman,  Jr.  - Texas  A&M  University 

K.  S.  Shanmugan  - University  of  Kansas 

E.  Parzen/W.  B.  Smith  - Texas  A&M  University 

A.  H.  Strahler  - Hunter  College 

E.  M.  Mikhail  - Purdue  University 

Grahame  Smith  - SRI  International 

i i i 


L.  Kanal  - LNK  Corporation 

L.  S.  Davis/A.  Rosenfeld  - University  of  Maryland 

In  an  attempt  to  group  presentations  of  a similar  nature,  the 
Symposium  was  divided  into  three  MATH/STAT  sessions  and  two  PATTERN 
RECOGNITION  sessions. 

The  papers  appear  in  the  Proceedings  in  the  order  in  which  they 
were  presented  at  the  Symposium.  An  agenda  and  a list  of  attendees  who 
registered  for  the  Symposium  are  included  in  the  Appendix. 

L.  F.  Guseman,  dr. 

Principal  Investigator  and 
MPRIA  Program  Coordinator 
Contract  NAS  9-16664 


1 


<,1585 ''16252 


ESTIMATING  LOCATION  PARAMETERS  IN  A MIXTURE 


R.  P.  Heydorn,  NASA  Johnson  Space  Center 
M.  V.  Martin,  Lockheed  Engineering  and  Management  Services  Company,  Inc. 


ABSTRACT 


This  paper  considers  the  problem  of  estimating  the  parameters  in  a finite 

M 

mixture  of  the  form  h(x)  = 5^  f(x  - ?,.)  where  , j = 1,  2,  . . . , M 

J=1  3 3 3 

are  location  parameters.  The  approach  is  based  on  an  integral  equation 

formulation  of  the  form  (x)  = fj?  f(x-y)  gt(y)dy  where  ht  is  a smoothed 
version  of  h and  g-t  is  a prio^  function  that  tends  to  be  concentrated  on  the 
translation  values.  A solution  for  that  uses  the  method  of  regularization 
and  one  based  on  a posterior  operator  approach  is  considered.  Numerical 
simulations  are  presented  to  bring  out  some  of  the  estimation  and  numerical 
problems  of  these  approaches. 


INTRODUCTION 


He  begin  with  a formulation  of  the  mixture  problem  as  essentially  given 
by  Teicher  [1J.  Let  F - {f^He  1RN}  be  a family  of  probability  density 
functions  and  let  G be  a distribution  function  on  where  is  the  set  of 
real  vectors  of  dimension  N.  For  the  given  G we  define  the  mixture  density  h 
as 

(1)  h = J f5  dG(s) 

Since  all  the  members  of  F are  used  in  this  definition  it  makes  sense  to  say 

that  according  to  equation  (1)  F defines  a mapping,  say  F , from  the  set  of 
all  G-distributions,  say  G,  to  the  set  of  all  induced  h-densities,  say  H . 

If  F:G  * H is  one-to-one  and  onto,  then  we  say  H is  identifiable. 

In  our  case  we  will  be  interested  in  the  so  called  finite  mixture.  For 
the  finite  mixture  the  measure  induced  by  G assigns  positive  probability  to 
only  a finite  number  of  e-values.  Accordingly,  the  finite  mixture  can  be 
written  as 

M 

(2)  h=  E a . f 

J=1  3 

where  0 < xj  < 1 and  = 1.  As  we  will  discuss  shortly,  this  representa- 

tion of  mixtures  appears  to  be  most  useful  in  remote  sensing  applications. 

As  described  by  equation  (2),  the  finite  mixture  model  is  a parametric 
model  and  therefore  to  specify  the  model  one  must  estimate  or  otherwise 
determine  the  parameters  M,  for  j = I,  2,  . M.  When  H is 

identifiable  these  parameters  are  uniquely  determined  and  therefore  one  should 
be  able  to  estimate  them  from  just  the  random  observations  which  have  density 
h.  Note  that  since  the  \j's  can  be  considered  as  the  prior  probabilities  and 
are  being  estimated  from  "the  data"  (i.e.,  the  observations  that  have  density, 
h),  this  is  a form  of  the  Empirical  Bayes  Problem  as  discussed  by  Robbins  [21 . 


In  this  paper  we  will  be  concerned  with  the  case  where  F is  a translation 

family;  that  is,  f (x)  = f(x  - ej).  Since  the  family  F is  now  defined  on 

j 

the  translates  of  a given  function  we  will  denote  it  by  Ff.  Yakowitz  et  al. 
[3]  and  Heydorn  et  al.  f 4)  have  shown  that  any  translation  family  leads  to  an 
identifiable  mixture. 

We  have  chosen  to  restrict  ourselves  to  the  translation  family  for  two 
reasons.  First  of  all  remotely  sensed  measurements  of  radiance  values  from  a 
given  class  of  materials  on  the  Earth  can  often  be  reasonably  well  represented 
by  some  translation  family.  And,  therefore,  even  though  we  consider  this  work 
as  being  just  in  the  early  stages,  some  applications  appear  to  be  possible. 

The  other  reason  is  simply  that  by  studying  the  translation  case,  we  believe 
that  considerable  insight  into  the  more  general  problem  can  be  derived. 

Our  previous  work  (cf.  reference  [41)  addressed  the  finite  mixture  model 
for  the  case  where  the  mixture  density  h is  known.  In  that  case  a somewhat 
more  general  version  of  the  translation  family  was  treated  in  the  sense  that 
certain  (nuisance)  parameters  whose  values  were  unknown  were  allowed.  The 
approach  was  based  on  a theorem  of  Caratneodory  and  made  use  of  a constructive 
proof  of  that  theorem  due  to  S2ego  (as  described  in  Grenander  et  al.  [5]). 
When  h is  numerically  well  determined  we  found  that  M and  the  translation 
parameters  cj,  j = 1,  2,  . .,  M could  be  computed  in  many  cases.  When  h is 

not  known  and  must  be  estimated  from  a moderately  small  number  of  random 

observations,  then  the  variance  in  estimates  of  M and  cj,  j = 1,  2,  . .,  M is 

very  large.  Thus  for  the  small  to  moderately  large  sample  size  cases,  we  have 

chosen  to  take  a fresh  look  at  the  problem. 

In  this  paper  we  will  consider  an  "integral  equation"  formulation  of  the 
mixture  problem  and  discuss  two  approaches  for  obtaining  a numerical  solu- 
tion. One  approach  is  based  on  the  regularization  method  of  Tikhonov  et  al. 
[t>].  Wahba  [7]  discusses  this  method  in  connection  with  density  estimation 
problems;  Rice  et  al.,  [16]  in  connection  with  estimating  derivatives  and 
deconvolution  of  densities;  and  Medgyessy  [8]  in  connection  with  mixture 
problems.  The  other  approach  is  based  on  the  formulation  of  a posterior 
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operator  that,  in  a certain  sense,  avoids  the  inversion  problem  implicit  in 
the  regularization  method. 

Applications  to  Remote  Sensing 

The  finite  mixture  model  of  equation  (2)  provides  us  with  a representa- 
tion for  remotely  sensed  measurements  that  can  be  applied  toward  the  solution 
of  several  application  problems.  We  mention  three  such  problems. 

One  application  of  remotely  sensed  data  deals  with  inventories  of 
selected  materials  on  the  surface  of  the  Earth  (for  example,  the  determination 
of  the  acres  of  wheat,  the  acres  of  conifer  trees,  or  the  square  miles  of 
water).  MacDonald  et  al.  [9],  for  example,  discussed  the  use  of  remotely 
sensed  data  for  inventorying  wheat.  A popular  approach  for  this  application 
is  simply  to  classify  each  measurement  observation  into  one  of  M given 
material  classes  and  count  the  classifications  to  determine  the  proportion  of 
the  area  surveyed  that  belongs  to  a given  class.  This  method  only  works  well, 
in  general,  where  classification  errors  are  small.  Large  classification  error 
can  lead  to  biases  in  the  inventory.  If  in  the  mixture  model  of  equation  (2), 
a given  parameter  sj  can  be  uniquely  associated  with  a given  material  class, 
then  \j  is  the  proportion  of  that  material  when  this  association  can  be  made. 
In  other  words,  given  the  knowledge  that  each  material  class  on  the  ground  can 
be  represented  by  a member  of  some  known  family  F,  that  leads  to  identifiable 
mixtures,  one  should  be  able  to  unbiasedly  determine  the  proportion  of  each 
material  class.  We  point  out,  however,  that  the  mixture  model  does  not 
directly  give  us  a way  of  assigning  a material  class  name  to  each  Xj-value. 
This  has  been  called  the  "labeling  problem."  One  application  of  the  finite 
mixture  model  to  crop  inventories  and  an  approach  to  the  labeling  problem  is 
discussed  by  Lennington  et  al.  [10). 

Another  application  of  remotely  sensed  data  is  concerned  with  the 
determination  of  certain  properties  of  materials  on  the  Earth's  surface.  Goel 
et  al.  [Ill,  for  example  considers  the  problem  of  solving  for  the  variables  in 
the  Suits  [12]  vegetation  canopy  model  from  the  light  reflected  (more 
specifically  reflectance)  from  the  canopy  at  several  view  angles.  These 


variables  include  the  leaf  transmittance,  the  leaf  reflectance,  the  soil 

reflectance,  the  projections  on  the  horizontal  and  vertical  planes  of  the 

average  leaf  area  per  unit  volume.  If  we  let  xjj,  k « 1,  2,  . K represent 

reflectances  and  y^,  k = 1,  2,  . .,  K the  canopy  variables,  then  Goel 

addresses  the  problem  of  solving  for  the  canopy  variables  given  the  equations 

xk  = 3k  (yi»  y2»  • •*  yk)»  k = 1,  2,  . K.  If  x is  the  vector  of 

reflectance  variables,  y the  vector  of  canopy  variables,  and  T is  a 1-1 

transformation  determined  by  the  above  equations  then  x = T(y).  Under  this 

transformation  the  probability  that  the  canopy  variable  values  lie  in  a given 

set,  say  A,  is  given  for  the  kth  species  to  be  fx#«%  f (x)  dx  where  f is 

?k  ck 

determined  from  the  mixture  model.  In  this  approach  the  transformation,  T,  is 
never  inverted.  Inversion  is  generally  a difficult  numerical  operation  for 
the  Suits  model,  for  example. 

The  final  application  we  have  in  mind  is  classification.  The  classifica- 
tion function,  <t>,  is  a function  that  assigns  each  measurement  x to  one,  and 
only  one.,  of  M possible  classes.  If  each  parameter  cj,  j = 1,  2,  , .,  M in 
the  mixture  model  in  equation  (2)  is  uniquely  related  to  one  of  these  classes 
then  the  Bayes  classification  function  becomes 

4>(x)  = k iff  ki,  f (x)  > max  x.f  (x) 

5k  j J ?j 


If  one  is  searching  for  members  of  a particular  class,  say,  k,  then  a map 

M 

related  to  a class  map  could  be  obtained  by  observing  x.f  (x) /Y]  \.f  (x) 

k ck  J=1  J £j 

which  is  the  posterior  probability  that  x is  an  observation  from  class  k. 


An  Integral  Equation  Formulation  of  the  Mixture 

The  formulation  of  the  finite  mixture  given  by  equation  (2)  treats  the 
quantities  of  interest  M,  Xj,  cj*  j = 1,  2,  . M as  parameters.  While  in 
many  cases  the  estimation  of  the  parameters  \j,  cj,  j = 1,  2,  . .,  M can  be 
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done  quite  well,  the  estimation  of  M appears  to  be  a more  difficult  problem, 
as  may  be  suggested  by  the  fact  that  fewer  papers  address  this  problem.  Part 
of  the  difficulty  seems  to  be  that  the  parametric  formulation  tends  to  isolate 
each  parameter  so  that  a separate  estimator  is  needed  for  each  parameter.  The 
formulation  given  by  equation  (1)  gets  around  this  difficulty  somewhat  by 
"bundling  up"  these  estimation  problems  into  one  problem,  viz,  the  estimation 
of  a function  G. 

If  G was  an  absolutely  continuous  distribution  function,  not  a step 
(Heavyside)  function  as  in  the  case  of  a finite  mixture,  then  G would  define  a 
density  function  g and  we  could  write 

h(x)  = J*  M f(*-y)g(y)dy 
fRW 

where  feFf  . This  gives  an  integral  equation  representation  for  the 
mixture.  For  a finite  mixture,  however,  g is  a delta  function,  or  more 
correctly  a singular  generalized  function  (as  discussed,  e.g.,  by  Gel 'fond  et 
al.  [131}  and  so  the  integral  equation  representation  is  not  correct. 

However,  when  fsFf  we  can  consider  a "smoothed"  version  of  the  finite  mixture 
and  thereby  use  the  integral  equation  representation.  We  now  discuss  this 
approach  in  the  scalor  setting  (i.e.,!RN  =IR). 

Following  the  definition  of  a generalized  function  as  given  in  Gel'fond 
et  al.  [131,  let  (g,*)  denote  a linear  functional  defined  by  g where  $ eD. 

Here  D is  a family  of  functions,  each  member  of  which  has  bounded  support  and 
is  continuously  differentiable.  For  the  finite  mixture  case  we  have  (g,  <j>)  = 
M4 

Jj  x -4>(c • Since  our  mixture  is  a convolution  between  the  kernel  function 
j=l  3 3 


f and  g,  we  have  (denoting  convolution  by  "*") 

M 

(f*g,  4>)  = (£  f,  » <t>)  = (h,<f>) 

3=1  J 5j 

where  f (x)  = f(x-£.).  And  smoothing  with  a function  teL1s 

5j  J 1 


j*. 


8 


(t*M)  = (t*(f*g),«t>)  = (f*(t*g),4>) 

Since  (t*g,<i>)  is  a regular  generalized  function  we  have  letting  ht  = t*h  and 
9t  = t*g 

(3)  ht(x)  = f f(x-y)  gt  (y)dy 

•4 

where  [a,b]  contains  the  domain  of  gt» 

We  can  choose  the  support  of  t to  be  small  so  that  this  integral  equation 
formulation  (equation  (3))  can  be  a good  approximation  to  the  finite  mixture. 
For  example  if  we  choose 

(4)  t(x)  = (1-  nx2)2,  | Vn  x | < 1 

and  0 otherwise,  then  (t*h)  (x)  h(x)  (n  ->■<=)  if  x is  any  continuity  point  of 

h (cf.  Bochner  [141).  And  since  generalized  function  spaces  are  complete  and 

H is  assumed  to  be  identifiable,  lim  (t*h,*)  = (f*g,<t>). 

n + “ 

Equation  (3)  expressed  in  terms  of  the  operator  IF  becomes  ht  = [Tgt. 
Whenever  f is  continuous  on  [a,b]  (and  therefore  bounded),  IF  is  a compact 
linear  operator.  This  means  (cf.  Kolmogorov  et  al.  [151)  that  cannot  be 
bounded  on  an  infinite  dimension  space  and  hence  IF-*  would  not  be  a 
continuous  linear  operator.  Thus,  when  we  attempt  to  estimate  ht  by 
" 1 “ 

ht  = ht  + et  where  et  is  an  error  function,  IF  ht  could  be  grossly  different 
from  gt  (as  measured  by  the  supremum  norm). 

One  approach  for  solving  this  problem  is  to  use  the  regularization  method 
of  Tikhonov  et  al.  [6].  In  this  approach  one  defines  the  functional 


where  g^  is  assumed  to  be  differentiable  of  order  n.  Here  a is  called  the 
regularization  parameter.  The  resulting  solutions  say  gt,a  which  are  obtained 
by  minimizing  Sa  are  approximations  to  g^. 


If  we  replace  ht  by  some  approximation  h*  then  as  our  approximations  to 
ht  become  successively  better  we  can  successively  decrease  a so  that  gt#a 
approaches  (in  some  sense)  gt*  Wahba  [7],  for  example,  considers  such  an 
estimation  problem  and  uses  the  method  of  cross  validation  to  pick  a-values. 

Rather  than  considering  the  operator  IF  whose  inverse  is  discontinuous, 
it  is  possible  in  some  cases  to  derive  another  linear  compact  operator  from 
the  kernel  f,  say  IP  so  that  |Ph  = gt«  Me  now  consider  this  approach  for 
solving  for  gt* 

Assuming  for  a moment  that  we  can  take  the  Fourier  transform  in  equation 
(3),  and  letting  " - 11  denote  the  corresponding  transformed  function  we  have 
(for  w = 2 itv) 

ht(oi)  = f(w)  gt(u) 


and, 

* h (oj)  = gt(to) 

f(u) 

If  t{<o)/f(<u)  has  an  inverse  Fourier  transform,  say  tf  then 

tf(x  - y)h(y)dy  = gt(x) 

and  if  further  tf  is  continuous  on  the  bounded  interval  [a1,  b'],  then  the 
linear  operator  fP  defined  by  the  kernel  tf  is  a compact  operator  as  was  IF. 
We  refer  to  IP  as  the  posterior  operator  since  it  operates  on  h to  produce  a 
prior  function. 
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If  we  consider,  for  example,  the  function  t of  equation  (4),  then  since 
(t*g,o)  = (g,t*<i>)  and  (t*$)-*-<t>(x)  (n+=)  we  would  have  lim(tf  * h ,$)  = {g,<t>). 
Thus,  by  properly  choosing  t we  can  obtain  a good  approximation  to  g. 

An  example  of  a case  where  t(m)/f(m)  has  an  inverse  Fourier  transform 

comes  from  the  gamma  family  of  densities  where  f(x)  = ^ ^ xne“x^Y  for  x > 0 

nl  y 

and  is  0 otherwise.  Later  we  will  consider  some  numerical  examples  of  mixtures 

n+1 

of  gammas.  The  Fourier  transform  in  this  case  is  (I  + iwy)  t(wy),  and  hence, 
if  t has  n+1  derivatives  in  some  interval  [a,b]  and  = t^n+^(b)  = 0 

then  the  kernel  function  tf  is  of  the  form  t^(x)  = (1  + (n|^)y0  + (^Jy^D2  + 

y D )t(x)  where  D is  the  standard  derivative  operator.  Notice  that  the 
above  gamma  function  does  not  have  an  (n+1)—  derivative  at  x = 0.  The  role  of 
the  function  t in  this  case  is  to  smooth  the  gamma  function  so  that  the  (n+1)— 
derivative  exists. 


Numerical  Solutions 


To  understand  numerical  estimation  problems  associated  with  both  the 
regularization  and  the  posterior  operator  methods  discussed  above,  we  conducted 
simulation  studies  using  translations  of  beta  and  gamma  distributions.  One  of 
the  reasons  for  choosing  these  families  is  that  they  have  positive  support  and 
are  skewed.  Both  of  these  properties  are  also  found  in  typical  densities  of 
remotely  sensed  measurements. 

For  the  regularization  approach  we  chose  mixtures  of  translates  of  beta 
densities.  Our  mixture  density,  h,  in  this  case  was 

M 

h(x)  = 2 x,  f(x  - c.) 

J=1  J J 

where 

f(x)  = j 12  (l-x)2x,  0 < x < 1 
(o,  otherwise 
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In  equation  (3)  which  is  an  approximation  to  h,  is  of  the  form 

ht(x)  = E (t(x  - X)) 

where  X is  a random  variable  distributed  according  to  the  mixture  h.  Given  a 
set  of  iid  random  variables  X^,  X2*  . . .,  Xn  each  distributed  as  X,  we  can 
estimate  as 

- N 

ht(x)  = i S t(x  - X3-> 


For  our  simulation  we  chose  t to  be  a third  order  B-spline,  i.e., 
i(|)2,  0 < x < a 


(-2(f)2  6 f-3) 


t (x)  = 


, a < x < 2a 


|(3  - £)2,  2a  < x < 3a 


Moreover,  we  used  the  following  B-spline  approximation  to  gt,  viz, 

L+I 

(6)  gt(x)  = 2 CM*  ' ftA) 

a=0  a 

Finally,  we  used  a modification  of  the  regularization  formulation  of 
equation  (5);  and  that  is,  we  considered  only  the  second  derivative  of  gt  in 
the  constraint  rather  than  all  of  its  derivatives.  This  form  of  the  regular- 
ization problem  is  considered  by  Rice  et  al.  [16).  In  that  paper  bias  and 
variance  expressions  for  the  solution  function  (which  is  also  approximated  by 
B-splines)  are  derived. 

A 

To  obtain  gt  we  have  to  solve  for  the  c-coefficients  in  equation  (6). 

A 

Letting  these  solution  coefficients  be  c^,  i - 0,  1,  Z,  . L+l  we  have  from 
equation  (5)  with  the  above  approximations, 
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i 

i 

j 


i. 


c = ( a'a  + NaBj'^'h 


(7) 


where 


A - [a(j,i)) 


fb 


S'  * (h(xi)»  h(x2),  . . h(xn)) 

n A *»  A 

c*  = (cq,  . » * cl+i) 

n = number  of  random  observations  from  the  mixture 

N = number  of  points  at  which  g is  estimated  (this  is  the  range  of  the 

11  j"  index) 

L+2  = number  of  B-splines  (this  is  also  the  range  of  index  "a") 

a = interval  spacing  of  the  B-splines 


It  is  seen  from  equation  (7)  that  c is  a ridge  - regression-like 

A 

estimator  which  minimizes  the  variance  of  each  c,  by  adding  bias,  The  size  of 
the  bias  is  influenced  by  the  regularization  parameter  a. 


Figures  1,  2,  and  3 are  examples  where  a mixture  of  two  beta  densities, 
with  equal  mixing,  are  considered.  In  these  figures  the  spacing  between  the 
B-splines  was  .05.  We  see  from  figure  1 that  when  ~ 51  = then  the 

betas  are  too  close  to  distinguish  the  fact  that  two  components  are  present. 
When  = **  as  in  ^9ure  2 we  he9in  to  see  two  modes  to  the  prior 

function  plus  some  oscillatory  behavior;  when  eg  - - *15  as  in  fi9ure  3 the 

two  modes  are  very  distinct.  The  major  peaks  of  the  graph  of  the  prior 


function,  gt  appear  to  be  located  to  within  .02  of  the  translation  values 
and  s2)  ^ figures  2 and  3. 

For  the  posterior  operator  approach  we  chose  mixtures  of  translates  of 
gamma  distributions.  In  this  case 

= n ^ MXJ  " V 


where  now  the  random  variables  X],  1 = 1,  2,  . . , n each  have  density  h and 
the  components  of  h are  of  the  form 


i-xe'x/T 


x > 0 


f(x)  = 

;o,  x < o 

The  posterior  kernel,  tf,  is  of  the  form 


tf(x)  = t4(x)  + fY(t3(x)  - t^(x  - A)) 


+ 


2 

^2  (t2(x)  - 2t2(x  - A)  + t2(x  - 2A)) 


where  the  t^,  k = 2,3,4  are  2nd,  3rd,  and  4th  order  B-splines  respectively, 

(t3  is  the  same  as  the  function  t considered  in  the  regularization  method. 

The  expression  for  t4  and  t2  can  be  found  in  Schumaker  [17]). 

Figures  4 to  7 are  example  graphs  for  mixtures  of  two  gamma  densities 
equally  mixed.  In  these  cases  the  spacing  between  the  B-splines  was  .1  rather 
than  .05  as  in  the  previous  figures.  Figure  4 shows  that  tg  - C;L  » .1  Is  too 

small  to  distinguish  the  existence  of  two  components  to  the  mixture.  At 
s2  - &i  = *15  we  begin  to  see  two  components  in  figure  5.  At  ^ = .2 
figure  6 shows  two  distinct  components,  and  finally  at  ?2  - ^ = .35, 
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figure  7,  the  function  has  two  very  distinct  peaks.  The  graphs  after 
figure  4 also  show  that  the  peaks  in  gt  occur  at  essentially  the  translation 
values. 

Figures  8 and  9 are  examples  where  the  components  are  unequally  mixed. 

In  figure  9 we  see  that  for  x^  = .1  and  x?  = .9  the  first  peak  can  get 

confused  with  the  oscillatory  behavior  of  the  gt  functions. 

Finally  figure  10  shows  a case  where  the  number  of  random  observations 
from  the  mixture  was  only  100.  In  the  previous  graphs  1000  observations  were 
used.  This  suggests,  perhaps,  that  much  fewer  that  1000  observations  could 
have  been  used  for  these  examples. 

CONCLUDING  REMARKS 


Our  purpose  in  these  studies  is  to  explore  some  of  the  estimation  and 
numerical  problems  associated  with  solving  for  the  prior  function  in  a finite 
mixture.  We  chose  to  begin  by  considering  mixtures  of  translates  partially 
because  convolving  the  mixture  with  some  smooth  function  is  the  same  as 
smoothing  the  prior  function.  This  fact  leads  to  an  integral  equation 
representation  of  the  finite  mixture.  The  solution  of  the  resulting  integral 
equation  by  either  the  regularization  method  or  the  posterior  operator  method 
leads  to  a graph  of  the  prior  function  in  which  the  number  of  components  in 
the  mixture  can  often  be  easily  determined  (at  least  visually)  and  the 
translation  values  can  be  reasonably  well  approximated.  However,  by  smoothing 
the  prior  function,  it  now  appears  to  be  difficult  to  estimate  the  mixing 
proportions,  Xj, j = 1,2,.  . M.  When,  for  example,  maximum  likelihood  methods 
are  used  to  estimate  these  parameters,  for  a given  M,  the  x.  values  are  often 

J 

easily  estimated. 

There  are  of  course  a number  of  numerical  problems  associated  with  these 
numerical  approaches.  The  fact  that  the  spline  solution  for  the  prior  func- 
tion tends  to  oscillate  and  can  go  negative  is  disturbing.  We  hope  to  examine 
some  of  these  problems  in  future  studies. 
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ABSTRACT 


In  this  paper  we  describe  our  work  in  bringing  nonparametric 
methods  to  bear  on  data-intensive  problems  faced  by  NASA.  The  theoreti- 
cal development  of  efficient  multivariate  density  estimators  and  the 
novel  use  of  color  graphics  workstations  are  reviewed.  The  use  of  non- 
parametric density  estimates  for  data  representation  and  for  Bayesian 
classification  are  described  and  illustrated.  Our  progress  in  building 
a data  analysis  system  in  a workstation  environment  is  reviewed  and 
preliminary  runs  presented. 
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1*  iDtJM.duction 

Our  research  program  has  focused  on  developing  strategies  based  on 
nonparametric  density  estimation  to  aid  in  the  varied  large  data 
analysis  tasks  faced  by  NASA.  Our  efforts  have  been  twofold:  first*  to 
build  these  statistical  tools  into  a highly  interactive  color  computer 
graphics  environment  to  experiment  with  multivariate  data  analysis; 
second,  to  vigorously  pursue  theoretical  research  in  multivariate  den- 
sity estimation  that  will  directly  aid  the  practical  application  of  our 
statistical  tools. 

In  the  first  area,  we  have  developed  software  based  on  new  density 
estimation  algorithms  particularly  well-suited  for  interactive  comput- 
ing. This  software  has  been  tested  on  simulated  and  real  data  sets  with 
two*  three*  and  four  variables.  The  psychological  impact  of  data 
analysis  performed  in  this  manner  has  been  favorable  and  we  continue  our 
efforts  to  greatly  facilitate  use  of  our  tools  for  new  data  sets.  In 
the  second  area*  we  are  examining  those  theoretical  issues  involved  in 
our  application  of  multivariate  density  estimation.  We  have  developed  a 
new  density  estimator  called  the  averaged  shifted  histogram  that  is  many 
times  faster  than  the  well-known  kernel  estimator.  The  statistical  pro- 
perties of  this  estimator  are  contained  in  Scott  C7],  The  calibration 
of  density  estimators  is  addressed  in  a joint  paper  with  Terrell  Cl 13  > 
The  result  of  this  work  should  be  an  exportable  product  that  can  be  used 
by  researchers  with  varying  levels  of  expertise  in  statistics. 

In  this  paper  we  illustrate  several  applications  of  our  methods  and 
the  workstation  environment.  First*  our  multivariate  density  function 
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graphical  representation  is  a powerful  aid  to  constructing  and  under- 
standing Gaussian  mixture  models  for  data  in  three  and  four  dimensions. 
Second*  the  power  of  the  average  shifted  histogram  as  the  basis  of  a 
Bayesian  classifier  is  illustrated  on  bootstrapped  trivariate  data.  The 
performance  is  compared  to  that  of  a maximum  likelihood  Gaussian  clas- 
sifier. Third,  we  examine  the  practical  significance  of  relatively  mod- 
est departures  of  data  from  Gaussian  assumptions.  Our  results  indicate 
that  some  useful  gains  may  be  realised  for  such  data.  Fourth,  we  demon- 
strate a prototype  of  a data  analysis  system  based  on  the  Silicon  Graph- 
ics Iris  workstation.  This  system  allows  rapid  comparison  of  data  in  up 
to  four  dimensions.  We  believe  that  significant  gains  will  be  realized 
as  we  break  the  two-dimensional  barrier  and  begin  to  work  directly  with 
three  and  four  dimensional  data.  We  have  attempted  to  create  algorithms 
that  make  it  possible  to  analyze  such  data  with  modest  computational 
requirements,  attempting  to  realize  real-time  representation  and 
analysis. 

1.  Review. 

2..1  Graphical  Tools  ia  P.S.U.  Analy.gjp 

A recent  theme  in  multivariable  data  analysis  as  advocated  by,  for 
example,  John  and  Paul  Tukey  E 13 3 emphasizes  graphical  techniques  for 
looking  for  multidimensional  structure  in  data.  The  bivariate  scatter 
diagram  has  been  a very  useful  tool  in  this  approach.  For  data  in  more 
than  two  dimensions,  careful  selection  of  bivariate  projections  can 
reveal  structure  in  higher  dimensions;  see,  for  example,  a description 
of  the  projection  pursuit  algorithm  [3].  Alternately  glyphs  may  be 
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drawn  instead  of  dots  in  a bivariate  scattergram  and  data  values  not 
displayed  are  represented  by  features  in  the  glyph*  such  as  length* 
angle*  etc.  Computer  graphics  workstations  have  recently  made  trivari- 
ate scatter  diagrams  feasible.  A true  three-dimensional  effect  may  be 
had  by  either  continuous  rotation  of  the  scatter  diagram  or  by  a variety 
of  stereographic  techniques  using  red/green  or  polarized  glasses*  Holo- 
grams and  rapidly  vibrating  mirrors  also  can  proved  3-D  effects.  For 
data  with  more  than  three  variables*  side-by-side  scatter  diagrams  of 
subsets  of  variables  with  visual  links  (such  as  coloring  the  same  point 
in  the  different  diagrams)  allow  a representation  of  the  data. 

Scatter  diagrams  do  have  limitations  in  data  analysis.  The  most 
important  problems  relate  to  sample  size.  For  moderately  large  samples 
(n>500  ) data  replication  (or  overstriking  on  the  graphical  medium) 
begins  to  occur  frequently.  This  problem  has  been  referred  to  as  the 
problem  of  ”too  much  ink”  [123.  With  continuous  rotation  many  more 
points  are  viewable  but  current  computer  technology  limits  real-time 
rotations  to  about  one  thousand  points.  Secondly*  clusters  of  points 
that  are  close  together  are  difficult  to  detect  in  scatter  diagrams.  In 
other  words*  scatter  diagrams  provide  only  modest  indications  of  the 
density  of  points  in  a given  region.  Thirdly,  our  impression  of  data 
from  the  same  underlying  density  function  is  highly  dependent  on  the 
sample  size.  This  makes  comparisons  of  scatter  diagrams  with  different 
sample  sizes  nontrivial.  The  eye  naturally  leaves  the  center  of  the 
data  and  focuses  on  outliers  and  apparent  structure  (lines)  in  outlying 
regions.  Such  features  may  or  may  not  be  of  great  importance  depending 
on  the  objectives  of  the  data  analysis. 


Vie  also  advocate  using  scatter  diagrams  for  looking  at  data.  How- 
ever since  we  are  interested  in  discovering  structure  such  as  modes  and 
high  density  regions*  we  have  found  that  the  density  function  is  a more 
useful  tool  when  taking  a preliminary  look  at  data  in  several  dimen- 
sions. The  density  function  does  not  change  with  sample  size*  although 
the  quality  of  estimation  changes.  In  a sense  the  scatter  diagram 
points  to  the  density  function. 

Z,Z  G.o,inmii;.a,tAO.oal  aoi  Representational  gxoblems 

Nonpararaetric  density  estimation  methods  for  multivariate  data  are 
often  simple  extension  of  well-studied  univariate  versions.  The  mul- 
tivariate histogram  is  a computationally  efficient  estimator  but  suffers 
from  empty  bin  problems  and  bin  edge  effects.  Statistically  more  effi- 
cient and  smoother  multivariate  estimators  may  be  obtained  by  kernel 
methods;  see  Tapia  and  Thompson  El03.  Thus  we  believe  the  fixed  mul- 
tivariate kernel  estimator  of  Cacoullos  [2]  is  a useful  technique  for 
data  in  2-4  dimensions.  Unfortunately  computational  requirements  grow 
rapidly  in  higher  dimensions  if  one  desires  to  evaluate  the  estimate 
over  a representative  multivariate  mesh.  The  estimator  also  requires 
the  entire  raw  data  in  order  to  compute  the  pointwise  estimates.  Some 
research  has  focused  on  one  and  two  dimensional  numerical  approximations 
to  kernel  estimates  in  order  to  achieve  computational  efficiency  L93. 
However  few  results  are  currently  available  for  more  variables. 

Another  approach  is  to  construct  a frequency  polygon  estimator 
which  is  formed  by  connecting  with  straight  lines  the  mid-bin  values  of 
a histogram.  This  estimator  has  the  same  order  of  statistical 
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efficiency  as  the  kernel  estimator  and  also  the  computational  efficiency 
of  the  histogram*  The  frequency  polygon  of  the  ordinary  histogram  works 
well  for  two  dimensional  data;  however*  bin  edge  effects  still  can  be  a 

problem  for  small  samples  and  in  higher  dimensions.  Thus  we  have 

recently  proposed  a new  density  estimator  based  on  a frequency  polygon 
of  a generalized  histogram  estimator  called  the  averaged  shifted  histo- 
gram  (ASH)  [73.  The  ASH  is  simply  the  pointwise  average  of  m histograms 
with  common  equally  spaced  bins  of  width  h but  different  bin  origins 
tg+“h*  i = 0,..,m-l.  Thus  the  ASH  looks  like  a histogram  with  bin  width 
h/rn.  As  m-’-oo  the  ASH  is  identical  to  the  triangle  kernel  estimate. 
Values  of  m between  3 and  10  are  sufficient  for  most  purposes.  Mul- 
tivariate versions  are  easily  constructed  by  shifting  and  averaging  in 
all  co-ordinate  directions. 

Representational  difficulties  have  been  addressed  for  three  and 
four  variable  density  estimates  (function  surfaces  in  four  and  five 

dimensions*  respectively)  by  displaying  generalized  contour  plots.  For 
trivariate  data,  a particular  contour  of  r(x,y,z)  will  be  a set  of 

points 

S = { (x,y,z)  e It3  : f(x,y,z)  = c } . 
c 

3 

The  set  S will  be  a surface  in  ET  (or  more  than  one  surface  if  the 
c 

density  is  multimodal  at  this  level).  On  a graphics  terminal  we  have 
chosen  to  represent  Sc  by  intersecting  it  with  a series  of  equally 
spaced  planes  orthogonal  to  the  x-axis.  say,  and  then  drawing  the  con- 
tours defined  by  these  intersections.  The  resulting  "wire”  diagrams 
give  a strong  3 dimensional  impression.  If  color  is  available,  several 
contour  levels  may  be  simultaneously  displayed  by  using  a different  v 
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color  for  each  level. 

It  is  helpful  to  imagine  what  this  representation  looks  like  for 
trivariate  Gaussian  data.  For  the  independent  variable  case.  S is  sim- 
ply  a sphere  so  that  a display  would  show  several  concentric  spheres 
with  the  mode  located  at  the  center.  If  the  variables  are  correlated  we 
will  see  ellipsoids  rather  than  spheres. 

To  represent  the  density  estimate  of  four  variables,  ^(x.y.z.t). 
we  look  at  the  sets 

A 

S = { (x.y.z)  e EJ  : $(x,y,z,fc)  = c } . 

t 9 C 

Here  we  have  arbitrarily  chosen  one  variable  and  placed  it  in  a refer- 
ence frame  which  may  conveniently  be  thought  of  as  a "time"  axis.  By 
looking  at  a time-lapse  sequence  of  representations  of  c we  obtain  a 
useful  view  of  the  data  which  highlights  important  features  such  as 
modes,  outliers*  symmetry*  skewness,  and  covariance  structure. 

Again  it  is  useful  to  construct  this  representation  for  quadravari- 

ate  Gaussian  data.  For  a fixed  contour  level  c*  as  t moves  through  the 

relevant  interval  of  support  (t  , »t  ) * will  be  a sequence  of 

mm  max  t,c 

initially  expanding  spheres  (ellipsoids)  which  continue  to  grow  until 
the  mode  is  reached  and  then  contracting  and  finally  vanishing  when  S. 


becomes  the  null  set. 


We  shall  assume  that  our  data  samples  are  labeled  so  that  super" 
vised  clustering  and  discrimination  are  feasible.  As  a preliminary 
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step*  side-by-side  scatter  diagrams  may  be  displayed  to  get  a rough 
feeling  for  the  separability  of  cluster  classes.  This  may  also  be 
accomplished  by  displaying  side-by-side  density  contour  plots  for  the 
cluster  classes.  For  large  training  samples  the  latter  is  more  useful 
(the  scatter  diagram  might  indicate  no  separation  at  all). 

When  the  preliminary  density  estimates  have  been  refined  by  optimal 
data-based  choices  of  smoothing  parameters,  classification  may  be  accom- 
plished using  a Bayesian  classifier.  Evaluation  of  the  averaged  shifted 
histogram  for  each  class  involves  only  a bin  location  operation  (sub- 
traction and  division)  and  then  a table  lookup  for  each  training  class 
(hash  function,  perhaps).  This  is  a computationally  efficient  operation 
although  large  memory  requirements  are  necessary  in  several  dimensions. 
Examples  are  given  next. 
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3..  Examples 

3..1  Three-D imens ional  Analysis  Using  Badhwar  Data. 

We  shall  consider  the  scatter  diagram  approach  as  a preliminary 
step  towards  producing  a nonparasnetric  classifier.  The  data  are 
trivariate  and  come  from  a model  applied  to  individual  pixels  (1.1  acre) 
using  temporally  measured  Landsat  data.  Five  acquisitions  of  4-channel 
remote  sensing  reflectance  intensity  data  were  converted  into  a single 
"greenness"  time  series  by  looking  at  a certain  linear  combination  of 
the  4-channel  data.  The  time  series  was  fitted  by  Sadhwar's  [1]  growth 
model*  which  resembles  a bell-shaped  curve.  For  each  pixel  three  param- 
eters from  Badhwar* s model  were  extracted:  x.  the  time  of  peak  green- 

ness; y*  the  ripening  or  reproduction  period;  and  z.  the  peak  greenness 
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level.  Each  measurement  was  recorded  on  a discrete  scale  from  0 to  249. 
The  data  are  processed  in  a segment*  which  is  5 by  6 nautical  miles  and 
contains  22*932  (117  by  196)  pixels.  Ground  truth  was  obtained  by  send- 
ing observers  to  the  fields. 

The  first  group  of  examples  and  algorithms  will  be  illustrated 
using  1977  data  from  segment  1663*  which  is  located  in  North  Dakota. 
The  segment  is  primarily  in  agricultural  use  and  contains  large  fields 
of  sunflower,  soybeans,  sugar  beets,  spring  wheat,  barley,  flax  and 
oats.  We  wish  to  observe  that  our  style  of  analysis  is  possible  in  a 
wide  range  of  remote  sensing  and  multivariate  applications.  These  data 
have  been  furnished  and  are  not  completely  unfamiliar  to  researchers. 

To  simplify  the  presentation,  we^chose  to  analyze  pure  pixels  from 
three  crops:  sunflower,  spring  wheat*  and  barley.  Sunflower  is  fairly 
easy  to  distinguish  from  other  crops,  but  spring  wheat  and  barley  are 
less  easy  to  distinguish.  We  have  also  chosen  to  analyze  the  three 
estimated  Badhwar  parameters  for  these  pixels.  This  segment  contained 
3694,  3811,  and  892  pure  pixels,  respectively.  In  Figures  la,  2a,  and 
3a,  scatter  diagrams  of  the  pure  pixels  are  shown  for  each  crop.  The 
digital  nature  of  the  data  is  apparent  from  the  display.  Unfortunately 
in  this  paper  we  cannot  use  the  color  cue  to  give  the  value  of  the  third 
orthogonal  dimension.  The  axes  are  oriented  as  though  we  were  looking 
at  the  data  from  infinity  along  the  vector  (1,1,1).  The  Badhwar  estima- 
tion procedure  produces  clearly  poor  values  for  a small  fraction  of  the 
pixels.  Pixels  with  Badhwar  parameters  falling  outside  certain  ranges 
for  each  crop  were  deleted  from  the  analysis.  The  final  numbers  of  pix- 
els considered  were  3505,  3782,  and  873,  respectively.  The  mean  vectors 
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and  covariance  matrices  for  these  data  were  obtained  using  the  usual 
maximum  likelihood  estimates*  The  contours  of  the  fitted  trivariate 
Gaussian  density  are  shown  in  Figures  lb»  2b*  and  3b»  respectively.  The 
contours  are  nested  ellipsoids  and  are  drawn  at  the  levels  10%.  35%.  and 
70%  of  the  respective  modal  value.  The  graphing  area  is  the  same  for 
all  views  and  crops.  The  same  level  contours  of  the  estimated  average 
shifted  histograms  (3  shifts  in  each  dimension)  are  shown  in  Figures  lc» 
2c,  and  3c,  respectively.  Remembering  that  the  average  shifted  histo- 
gram faithfully  reproduces  features  in  the  data,  we  see  some  interesting 
but  not  dramatically  non-Gaussian  features  for  the  spring  wheat  and  bar- 
ley data.  What  is  not  as  clear  is  that  the  covariance  matrix  for  the 
sunflower  data  has  been  affected  by  outliers  (in  spite  of  the  deletions 
above)  and  the  Gaussian  mode  is  0.00014  while  the  ASH  mode  is  0.00061. 
This  illustrates  the  robust  nature  of  the  nonpar ame trie  density  estima- 
tor. For  the  other  two  crops,  the  respective  modes  differed  by  less 
than  25%. 

If  we  had  more  sophisticated  displays,  we  could  more  clearly  demon- 
strate the  separation  of  the  sunflower  density  from  the  others.  In  fact 
the  separation  is  so  large  that  even  the  overinflated  Gaussian  covari- 
ance estimate  does  not  result  in  overlap  with  the  other  densities.  In 
general,  we  should  not  expect  to  be  so  fortunate.  There  is  significant 
overlap  of  the  spring  wheat  and  barley  contours,  the  barley  having  a 
somewhat  larger  z-mean  (peak  greenness)  than  spring  wheat.  By  the  way, 
the  sunflower  density  is  forward-left  in  this  picture,  having  a larger 
x-mean  (later  time  of  peak  greenness).  These  features  are  more  easily 
seen  in  the  workstation  environment  described  below. 


We  nay  ask  what  a mixture  density  would  look  like  in  this  represen** 
tat  ion  for  a field  with  roughly  equal  numbers  of  pure  pixels  of  the 
three  crops.  Using  the  estimated  densities  shown  above*  we  formed  the 
mixture  density  by  adding  the  three  densities  and  dividing  by  three. 
The  contours  (levels  1%  and  10%)  of  the  mixture  density  are  shown  in 
Figure  4.  The  sunflower  contribution  appears  only  in  the  1%  contour  and 
not  the  10%  contour  because  of  the  inflated  covariance  matrix  estimate. 
The  usefulness  of  graphics  such  as  these  in  programs  that  fit  even  more 
component  densities  seems  clear  and  should  be  of  great  help  in  under- 
standing these  procedures. 

The  next  step  in  our  data  analysis  is  to  use  the  estimated  densi- 
ties to  perform  a pixel-by-pixel  classification.  The  actual  ground 
truth  layout  makes  it  difficult  to  present  or  understand  our  results,  so 
we  decided  to  use  some  modern  statistical  techniques  to  create  a realis- 
tic field  of  size  60  pixels  by  60  pixels  using  the  bootstrap  technique. 
The  ground  truth  was  chosen  as  shown  in  Figure  5 (unfortunately,  the 
colors  are  not  distinguishable  with  B&W  film,  but  the  error  pictures 
indicate  the  pattern  selected).  The  actual  Badhwar  parameters  for  each 
pixel  were  selected  from  the  real  database  by  selecting  a pure  pixel  at 
random  and  assigning  it  to  the  "new"  field.  This  bootstrapped  field  has 
roughly  equal  numbers  of  pixels  for  the  three  crops,  about  1200  pixels 
each.  The  estimated  densities  were  the  same  as  in  the  previous  discus- 
sion, obtained  from  the  full  outlier-deleted  data  sets. 

The  results  of  the  classification  are  shown  in  Figures  5 and  6.  In 
Figure  6,  the  left  column  indicates  how  pixels  were  classified  (ASH  on 
top;  Gaussian  below).  To  the  right  of  these  classification  maps  are 


the  error  maps}  black  pixels  indicate  correctly  classified  pixels  while 
colored  pixels  indicate  the  crop  misclassif ied.  The  misclassif ication 
table  is  shown  in  Figure  5.  Both  density  estimators  classified  nearly 
100%  of  the  sunflower  pixels  correctly*  as  expected.  However  with  the 
small  grain  pixels*  the  averaged  shifted  histogram  performed  much  better 
than  the  Gaussian  estimates*  in  spite  of  the  rather  modest  non-Gaussian 
features.  For  spring  wheat*  the  ASH  correctly  classified  76,5%  while 
the  Gaussian  only  65.8%,  For  barley*  the  results  were  73,8%  and  76.7% » 
respectively,  the  Gaussian  density  performing  slightly  better.  In  an 
error  reconciliation  map,  it  is  clear  that  both  procedures  misclassif ied 
many  of  the  same  pixels,  but  the  nonlinear  boundary  of  the  ASH  procedure 
resulted  in  superior  performance. 

The  good  performance  of  the  ASH  carries  over  to  subsequent  classif- 
ication smoothing  algorithms.  For  example,  if  we  use  a simple  majority 
filter  to  smooth  the  previous  pixel-by-pixel  classification,  we  find  the 
results  shown  in  Figures  7 and  8.  With  this  smoothing,  100%  of  sun- 
flower pixels  were  correctly  classified.  For  spring  wheat,  the  results 
were  93.7%  and  80.3%,  respectively.  For  barley,  the  results  were  89.9% 
and  93.7%,  respectively.  Clearly  the  Gaussian  classifier  has  improved 
its  score  on  barley  pixels  at  the  cost  of  many  misclassif  ied  spring 
wheat  pixels.  This  is  the  result  of  the  non-Gaussian  features  in  the 
data.  We  do  not  expect  a substantial  amount  of  bias  in  these  estimates, 
but  we  have  not  pursued  this  point. 


2..1  Workstations  and  Four-Plmens iona l Tree  Example 
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The  development  of  a workstation  for  the  analysis  of  multivariate 
data  is  a project  now  receiving  special  attention  due  to  the  recent 
acquisition  of  a sophisticated  computer  graphics  system.  The  Integrated 
Raster  Imaging  System  (IRIS)  by  Silicon  Graphics,  Inc.  provides  our 
research  at  Rice  with  real  time  graphics  capabilities.  With  the  IRIS, 
instantaneous  scalings,  rotations,  and  translations  of  colored  graphical 
images  can  be  performed.  The  addition  of  real  time  graphics  capability, 
which  was  lacking  with  our  previous  graphics  hardware,  not  only  enhances 
our  analysis  of  3 dimensional  data  but  also  provides  for  the  effective 
representation  of  4 dimensional  data. 

Our  representation  of  4 dimensional  data  is  accomplished  by  using 
the  fourth  component  as  a nt ime n component,  as  described  in  section  2.2. 
For  each  interval  in  "time”  we  have  a 3 dimensional  data  set  (comprised 
of  the  first  three  components)  which  can  be  separately  viewed  as  in  the 
previous  example.  By  displaying  the  smoothed  sets  of  3 dimensional  data 
in  sequence  we  effectively  represent  a data  set  of  dimension  4. 

Although  the  workstation  for  multivariate  data  analysis  is 
currently  in  a prototypical  stage  of  development,  we  present  here  its 
application  on  a data  set  provided  by  NASA.  The  raw  data  are  7 channel 
spectral  measurements  of  aspen  and  spruce  forests.  This  7 dimensional 
data  set  is  reduced  in  dimensionality  from  7 to  4 by  selecting  the  major 
4 principal  components  axes  as  subspace  projection  axes.  After  the 
reduction  in  dimension  from  7 to  4 and  then  a rescaling  of  data  for  pur- 
poses of  graphical  display,  we  calculate  a sequence  of  contour  shells 
for  visual  inspection. 


In  Figures  9-13  we  illustrate  how  the  workstation  may  be  used  in 
the  comparison  of  data  sets  • On  the  left  hand  side  of  the  photographs 
are  the  contour  shells  representation  of  the  data  from  the  aspen  forests 
only.  Opposite  on  the  right  hand  side  are  the  contour  shells  for  the 
spruce  forests  only.  Most  apparent  from  the  pictures  are  the  observa- 
tions that  (1)  the  locations  of  the  modes  of  the  two  data  sets  are  well 
separated  and  (2)  the  shapes  of  the  contours  are  different  — meaning 
the  covariance  structures  are  different.  A more  subtle  feature  of  the 
contour  shells  is  that  they  are  not  precisely  elliptical,  but  skewed, 
indicating  that  the  data  are  not  quite  Gaussian. 

4.  Conclusions 

We  have  attempted  to  illustrate  how  nonparametric  density  methods 
may  be  brought  to  bear  directly  on  multivariate  remote  sensing  problems. 
Multivariate  parametric  models  based  on  mixture  models  [4]  have  many 
advantages,  both  conceptually  and  in  production  mode.  The  fitting  prob- 
lems in  the  parametric  case  are  usually  quite  difficult.  Vie  hope  to 
investigate  how  nonparametric  models  may  provide  guidance  to  the  fitting 
and  verification  of  such  parametric  models.  This  would  be  a direct  use 
of  the  exploratory  capabilities  of  the  nonparametric  models. 

Workstations  are  an  exciting  development  for  statisticians  and  data 
analysts.  Our  figures  and  the  particular  data  sets  chosen  for  analysis 
give  one  just  a glimmer  of  what  type  of  analysis  will  be  possible  on  the 
workstation.  Further  work  ^long  these  lines  will  focus  on  the  nontrivial 
problem  of  how  to  reduce  high  dimensional  data  sets  to  dimension  no 
greater  that  4 for  analysis  on  the  workstation.  The  optimal  reduction 
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Figures  5 and  6:  Pixel -by- pixel  classification 

of  bootstrapped  scene.  Figure  6 is  a blowvp  of 
the  second  and  third  rows  in  Figure  5.  See  text. 

<i  Figures  7 and  8:  Result  of  majority  smoothing 

filter  applied  to  the  previous  scene.  See  text. 
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Figure  11  (top) : Continuation  of  sequence  from  Figure 

10  with  x^=0 . . 

Figure  12  (bottom) s Same  contour  shells  as  in  Figure  9 
with  different  rotation. 
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ABSTRACT 

A general  theory  of  image  texture  models  is  proposed  and  its  appli- 
cability to  the  problem  of  scene  segmentation  using  texture  classification 
is  discussed.  A new  algorithm,  based  on  half-plane  autoregressive  filter- 
ing, which  optimally  utilizes  second  order  statistics  to  discriminate 
between  texture  classes  represented  by  arbitrary  wide  sense  stationary 
random  fields  is  described.  Empirical  results  of  applying  this  algorithm 
to  natural  and  synthesized  scenes  are  presented  and  future  research 
is  outlined. 


INTRODUCTION 


The  purpose  of  this  paper  is  to  present  preliminary  theoretical 
and  experimental  results  of  our  investigation  into  approaches  to  auto- 
matic scene  segmentation  based  on  texture  analysis.  It  is  imperative 
to  examine  fundamental  methodology. 

i 

As  perceived  through  our  senses  of  touch  and  vision,  texture  is 
a property  of  surfaces  and  images  easy  to  assimilate  yet  difficult  to 
articulate.  Consistent  with  this  view  are  the  following  remarks: 

"Despite  its  importance  and  ubiquity  in  image  data,  a formal  approach 
or  precise  definition  of  texture  does  not  exist."  - Robert  Haralick, 
ref.  [ 10], 

"Texture  is  an  elusive  notion  which  mathematicians  and  scientists  tend 
to  avoid  because  they  cannot  grasp  it.  Engineers  and  artists  cannot 
avoid  it,  but  mostly  fail  to  handle  it  to  their  satisfaction."  - Benoit 
Mandelbrot,  p.  310,  ref.  [181. 

In  consideration  of  this  view,  it  may  be  argued  that,  for  the  purpose 
of  developing  computer  algorithms  to  perform  scene  segmentation  based 
on  texture  analysis,  empirical  descriptions  of  texture  are  more  appro- 
priate than  formal  mathematical  models.  While  acknowledging  the  uti  lity 
of  specific  ad  hoc  techniques  developed  using  tnis  approach,  we  question 
the  ability  of  this  approach  to  provide  a framework  for  developing  tech- 
niques comparable  in  performance  and  flexibility  with  the  human  visual 
system.  Discussing  the  limitations  of  the  empirical  approach  of  the 
modern  behavioral  school  of  psychology,  in  ref.  [5]  Noam  Chomsky  remarks: 
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"To  go  further,  I believe  that  the  inability  of  modern  psychology  to 
come  to  grips  with  the  problems  of  human  intelligence  is  in  part,  at 
least,  a result  of  its  unwillingness  to  undertake  the  study  of  abstract 

structures  and  mechanisms  of  mind " "Had  the  physical  sciences 

limited  themselves  by  similar  methodological  strictures,  we  would  still 
be  in  the  era  of  Babylonian  astronomy." 

It  is  our  opinion  that  formal  mathematical  models  of  texture  are 
necessary  fc  4 developing  algorithms  and  for  understanding  their  per- 
formance. 

Our  methodology  employs  three  processes.  First,  we  assert  a theory 
for  a class  of  image  textures.  This  theory  is  a metaphor,  based  on 
our  experience  with  the  physical  phenomena  of  image  texture,  which  draws 
a comparison  with  a mathematical  model  as  described  by  a,  set  of  axioms. 
This  experience  includes  an  inspection  of  myriad  images,  rich  in  tex- 
tural detail,  arising  in  histology,  ref  [161,  the  classic  photography 
of  Phil  Brodatz,  ref.  [2],  [3],  [4],  Landsat  Imagery,  ref.  [8],  and 
many  optical,  infrared  and  microwave  images  at  JPL's  Image  Processing 
Laboratory.  The  mathematical  model  describes  images  which  appear  to 
decompose  into  distinct  connected  regions  such  that  each  region  is  per- 
ceived as  homogeneous  and  distinct  from  the  regions  contiguous  to  it. 

The  quality  of  homogeneity  occurs  when  a region  within  an  image  has 
a constant  brightness,  a repetitive  pattern  or  a random  pattern  which 
appears  the  same  throughout  the  region.  The  model  consists  of  associat- 
ing with  each  region  a stationary,  ergodic  random  field  that  accounts 


for  the  texture  class  of  the  region.  Contiguous  regions  are  required 
to  'belong  to'  different  texture  classes  though  more  than  one  region 
may  belong  to  the  same  texture  class.  The  image  brightness  values 
over  each  connected  region  correspond  to  a realization  of  the  random 
field  (a  choice  of  sample  values  for  each  of  the  random  variables 
corresponding  to  the  lattice  points  within  the  region)  over  the  region. 
Stationarity  accounts  for  the  homogeneous  property  of  the  image  over 
each  region  and  ergodicity  is  equivalent  to  the  assumption  that  the 
parameters  that  describe  the  random  fields  (the  joint  probability 
density  functions)  can  be  estimated  from  the  image  brightness  levels. 
Textbook  accounts  of  random  fields  are  found  in  ref.  [1],  [24),  and 
[25].  The  concept  of  stationarity  we  utilize  is  rather  general. 

For  the  purpose  of  developing  the  scene  segmentation  algorithm  dis- 
cussed in  this  paper,  which  utilizes  only  second  order  statistics, 
we  assume  stationarity  in  the  wide  sense  (that  the  means  and  covariances 
exist  and  are  translation  invariant).  We  have  also  developed  a non- 
linear extension  of  the  algorithm  that  assume  stationarity  in  the 
strict  sense  (that  the  joint  probability  distributions  are  transla- 
tion invariant).  Furthermore,  we  have  extended  the  concept  of  station- 
arity so  our  model  subsumes  both  the  fractal  model  discussed  in  ref. 
[17],  [18],  and  [20]  and  the  structural  model  discussed  in  [21]. 

The  second  process  employed  by  our  methodology  is  a mathematical 
elaboration  of  the  model.  This  consists  of  an  exploration  of  logical 
consequences  (theorems)  of  the  axioms  describing  the  model  that  have 
a significant  predictive  value.  These  predictions  allow  formulation 
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of  experiments  to  test  the  theory  (determine  whether  the  models  proposed 
by  the  theory  are  consistent  with  observations)  and  they  provide  design 
criteria  and  design  architecture  for  algorithms  to  solve  the  problem 
of  scene  segmentation  based  on  texture  classification. 

Because  the  subject  J < random  fields  is  in  its  infancy,  it  was 
necessary  to  discover  new  mathematical  results  as  well  as  examine 
known  results  related  to  our  model.  The  concept  of  representing  imaqe 
textures  by  wide  sense  stationary  random  fields  and  the  subsequent 
application  of  autoregressive  techniques  to  image  processing  problems 
is  not  new,  see  ref.  [15,  p.  238-243],  [22],  [9],  [19],  and  [23]. 

However,  assumptions  are  usually  made  restricting  the  class  of  random 
fields  to  which  the  techniques  are  applicable.  These  include  the 
following  assumptions: 

Assumption  1.  The  random  field  has  a joint  Gaussian  distribution. 

Asseumption  2.  The  random  field  is  described  by  a quarter  plane 
(causal)  finite  autoregressive  model. 

Our  mathematical  investigations  resulted  in  a precise  characteriza- 
tion of  the  restrictions  imposed  by  assumptions  1 and  2 above  and  a 
method  for  extending  autoregressive  techniques  to  arbitrary  wide  sense 
stationary  random  fields.  Also,  we  generalized  the  autoregressive 
technique  to  1)  an  autoregressive  technique  applicable  to  wide  sense 
stationary  random  fields  with  values  in  a vector  space  (to  include  multi- 
spectral  imagery),  2)  a non-linear  filtering  technique,  based  on  condi- 
tional probabilities,  applicable  to  arbitrary  strict  sense  stationary 


random  fields  with  values  In  an  arbitrary  set  (the  set  might  consist 
of  ‘local  primitives'  as  defined  by  the  structural  approach,  see  ref. 
[10],  [21]).  Each  of  the  techniques  above  can  be  proven  to  be  optimal 
in  the  sense  of  Bayes.  A complete  mathematical  treatment  of  these 
results  is  beyond  the  scope  of  this  paper  and  will  be  presented  in 
a final  research  report. 

The  third  process  consists  of  designing  experiments  and  making 
empirical  observations.  These  include  the  computer  implementation 
of  an  algorithm  for  scene  segmentation  based  on  texture  classification, 
utilizing  autoregressive  filtering,  which  is  described  in  Chapter 
2,  and  the  application  of  this  algorithm  to  natural  and  synthetic 
images,  which  is  described  in  Chapter  3.  Chapter  4 discusses  future 
experimental  research. 


58 


2.  SCENE  SEGMENTATION  A1GORITHM 

The  purpose  of  this  chapter  is  to  describe  a new  algorithm,  based 
on  an  autoregressive  filtering  technique,  for  scene  segmentation  using 
texture  classification.  The  scene  model  assumptions,  which  are  quite 
general,  are  the  following. 

Assumption  1.  The  scene  or  image  data  consists  of  a real  valued 
function  X defined  on  a finite  subset  D of  a two  dimensional  lattice 
L.  The  function  X represented  the  image  brightness  level. 

Assumption  2.  The  set  D decomposes  into  a finite  union  of  disjoint 
connected  subsets,  called  regions.  To  each  region  corresponds  a wide 
sense  stationary  real  valued  random  field  on  L,  whose  second  order 
statistical  parameters  represent  the  texture  class  for  that  region, 
such  that  the  restriction  of  X to  each  region  arises  as  a specific 
realization  of  the  random  field  corresponding  to  that  region  over 
that  region. 

Assumption  3.  The  second  order  statistics  (consisting  of  the 
mean  value  and  the  autocovariance  function)  of  any  two  random  fields 
corresponding  to  contiguous  regions  are  not  identical. 

It  is  important  to  observe  that  we  do  not  impose  any  special 
properties  for  the  second  order  statistics.  Nor  do  we  assume  any 
specific  form  for  the  joint  probability  density  functions  of  the  random 
field  - in  particular  we  do  not  assume  they  are  Gaussian,  in  any  sense, 
or  even  stationary.  Furthermore,  it  follows  from  our  assumptions 
that  the  performance  of  any  algorithm  for  scene  segmentation  based 
on  these  assumptions  will  be  limited  by  its  ability  to  discriminate 
between  texture  classes  based  only  on  estimates  of  second  order  statisti 


cal  parameters.  To  support  our  contention  that  this  does  not  impose 
a practical  limitation,  we  make  the  following  observations.  First, 
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there  is  substantial  experimental  evidence  indicating  that  human  visual 
texture  discrimination  utilized  only  second  order  statistics,  see 
ref.  [13],  [14].  Second,  Fourier  methods  of  texture  analysis,  which 
utilize  second  order  statistical  information,  have  proved  useful  in  number 
of  applications,  see  ref.  [6],  [7].  These  two  observations  suggest 
that  second  order  statistics  yield  a reliable  basis  for  discriminating 
between  statistically  distinct  regions  in  natural  imagery.  Third, 
it  follows  from  a theorem  we  derived  that  if  an  algorithm  utilizing 
second  order  statistics  yields  a high  confidence  of  correct  classifica- 
tion using  Bayes'  formula  with  Gaussian  parameters,  then  the  level 
of  confidence  is  also  high  if  the  actual  distribution  is  non-Gaussian. 
Fourth,  the  restriction  to  second  order  statistics  reduces  computational 
complexity  immensely.  Fifth,  usually  the  actual  distribution  of  the 
random  field  is  not  usually  known  (an  exception  occurs  for  synthetic 
radar  images  of  clutter  backgrounds  which  have  a ChiSquare  distribution 
with  the  two  degrees  of  freedom). 

The  algorithm  consists  of  the  following  steps. 

Step  1.  Select  a scene  X,  natural  or  synthesized.  We  have  developed 
a computer  routine  to  generate  realizations  of  jointly  Gaussian  distri- 
buted random  fields  whose  spectrums  can  be  arbitrarily  specified  by 
a finite  sum  of  functions,  each  constant  over  a rectangle.  It  uses 
the  FFT  algorithm  to  perform  an  approximation  to  Norbert  Wiener's 
stochastic  integral  representation  of  Gaussian  random  processes. 
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Step  2.  Select  training  areas  and  estimate  the  means  and  autocovari- 
ances for  each  of  the  N texture  classes. 

Step  3.  Choose  a finite  subset  S which  is  a subset  of  the  half- 
plane |(m,n)  such  that  n>o|vj(m,o)  such  that  m^o}.  For  each  k=l...,N, 
calculate  a function  F^iS^R,  F^{o,b)=l  and  the  variance  of 
F(<*R|t  is  minimized  where  R^denotes  the  random  field  for  K-th  class. 

Step  4.  For  each  K=1,...,N,  convolutional  filter  the  oriqinal 

scene  to  form  X=F*X. 

k k 

Step  5.  For  each  K=1,...,N,  calculate  Y^logV?^  + ( ) /2Y^ 
where  V^and  M^is  the  variance  and  mean  of  X^as  calculated  using  para- 
meters from  steps  2 and  3. 

Step  6.  Choose  a box  size  8>o  and  apply  an  B x B box  filter 

(average  filter)  to  to  obtain  2^. 

Step  7,  Classify  pixel  (a,b)£L  into  the  j-th  class  if  Z.(a,b}< 

3 

Zk(a,b)  for  all  k*j. 

A flow  chart  depicting  this  algorithm  is  given  in  the  attached 
algorithm  flow  chart. 

Based  on  the  theory  of  half-plane  prediction  developed  in  ref. 

[11]»  [12]  this  algorithm  has  the  following  properties: 

Property  1 . As  the  set  S gets  'large'  and  the  box  size  gets 
large,  the  performance  approaches  that  for  an  optimal  Bayesian  classi- 
fier if  each  texture  class  is  described  by  a jointly  Gaussian  field. 


^ -w  ■ 


. If  the  performance  of  the  algorithm  is  high,  then  the 


level  of  confidence  is  at  least  as  high  as  predicted  for  a jointly 
* 

Gaussian  process. 

Property  3.  The  algorithm  has  minimal  computational  complexity 
among  the  class  of  all  algorithms  satisfying  either  property  above. 


■ 
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3.  Algorithm  Evaluation 

The  autoregressive  filtering  algorithm  was  evaluated  using  various 
synthesized  images  and  some  natural  scenes.  Two  types  of  image  synthe- 
sis methods,  convolution  method  and  direct  spectrum  synthesis  method 
were  implemented.  The  algorithm  was  applied  to  the  synthesized  images 
with  fixed  parameter  set  up  of  training  area  (64  pixel  by  64  pixel), 
autoregression  window  (24  neighbor  pixels)  and  smoothing  window  (5  pixel 
by  5 pixel)  for  evaluation. 

The  convolution  method  synthesized  an  image  as  a function  of  neigh- 
bor pixels  as  described  in  equation  1. 

(1)  B(i,j)  = clA(i+l , j+1 ) + c2A(i+l »j+2)  + 
c3A(i+l ,j+2)  + c3A(i+2,j+2) 

where  A is  the  uniformly  distributed  random  noise  image  array  and  cl,c2, 
c3,c4  are  arbitrarily  assigned  constants. 

The  Figure  3.1  shows  the  segmentation  result  of  a uniformly  distri- 
buted random  noise  image  and  a synthesized  image  using  the  direct  convo- 
lution method.  The  segmentation  result  was  found  to  be  85%  correct. 
Figures  3.2  and  3.3  illustrates  histogram  distribution  characteristics 
of  the  two  images. 

More  synthesized  images  using  the  convolution  method  were  tested 
with  various  cl,c2,c3,  and  c4  values.  The  figures  from  3.4  and  3.6 
illustrate  algorithm  robustness  against  image  similarities.  When  the 
convolution  constants  of  two  synthesized  images  became  closer  to  each 
other  (i.e.,  images  more  similar  to  each  other),  the  segmentation 
results  became  slightly  worse. 


The  direct  spectrum  synthesis  method  was  developed  based  on  the 
fact  that  if  the  spectrum  of  an  image  generated  by  convolving  an  arbi- 
trary function  with  a Gaussian  random  noise  is  known*  the  image  can 
be  synthesized  directly  from  the  spectrum.  The  equation  2 shows  the 
detail  relation. 

(2)  B = F * G 

= APP.  [Fourier  (SQRT(S))  G ] 

where  S is  the  user  defined  spectrum 
Spec(B)  = S 
F = Fourier  SQRT(S) 

G is  the  Gaussian  random  noise. 

Therefore,  various  spectral  characteristic  images  can  be  synthesized 
by  applying  different  spectrums.  For  implementation  convenience,  the 
spectrum  was  generated  as  several  rectangular  shapes  of  spectrum  coef- 
ficients. Figures  3.7  and  3.8  illustrate  the  synthesized  images  from 
the  given  spectrums. 

Figure  3.9  is  the  segmentation  result  of  the  two  synthesized  images. 
The  result  shows  perfect  segmentation  of  two  textures.  The  figure  3.10 
shows  the  high  frequency  noise  effect  on  texture  classification.  Fi- 
gures from  3.11  to  3.14  show  segmentation  results  applied  to  various 
types  of  spectrum  images  with  smoothing  window  size  of  5 by  5 and  10 
by  10.  Enlarged  smoothing  window  showed  a little  improvement. 

It  was  found  that  the  segmentation  performance  gets  poorer  when 
the  spectrum  distribution  became  broader.  This  is  due  to  the  relation 
between  the  spectrum  energy  concentration  characteristics  and  the  auto- 
regression window  size.  The  detail  relation  is  not  yet  defined  and 
requires  further  study. 
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techniques  can  be  applied  to  make  them  modellable.  This  research  will 
examine  the  pre-processing  techniques  that  can  convert  images  into 
model lable  textures. 

The  second  question  may  be  answered  simply  that  the  training  area 
should  be  able  to  represent  the  statistical  and  spectral  characteristics 
of  the  entire  texture  class,  the  autoregression  window  size  should  be 
large  enough  to  capture  the  spatial  relation  among  pixels  for  the  given 
class  and  the  smoothing  window  size  should  be  selected  so  that  the  scene 
class  boundary  confusion  and  false  classification  can  be  minimized. 
However,  the  precise  expression  of  these  parameter  sizes  as  functions 
of  the  texture  characteristics  are  not  yet  defined. 

In  order  to  understand  the  relation  between  the  parameter  sizes 
and  the  texture  characteristics,  a large  set  of  images  {synthesized 
textures  and  natural  textures)  with  various  statistical  and  spectral 
characteristics  will  be  applied  and  the  segmentation  performance  will 
be  evaluated  with  respect  to  parameter  sizes. 

The  third  question  can  be  answered  only  after  the  human  percep- 
tion is  understood  quantitatively.  It  may  be  impossible  to  understand 
the  way  how  a human  perceives  textures  completely.  In  this  research, 
more  detailed  relations  between  the  human  perception  and  the  statistical 
and  spectral  characteristics  of  the  natural  scenes  will  be  examined. 
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LET  B BE  ANY  M BY  M BLOCK  OF  PIXELS  LYING  WITHIN  A REGION  FOR  SOME  ONE  OF  N 
TEXTURE  CLASSES.  LET  v BE  A VECTOR  REPRESENTING  THE  IMAGE  VALUES  OVER  B,  LET 

p Ik),  k = l N BE  THE  A PRIOR!  PROBABILITIES  FOR  THE  REGION  CONTAINING  B 

TO  BELONG  TO  THE  k-th  TEXTURE  CLASS,  LET  p (v  I k),  k » 1,  ....  N BE  THE  CONDITIONAL 
PROBABILITIES  (OR  DENSITIES)  FOR  jf,  AND  LET  p (k  I y)  BE  THE  A POSTERIORI 
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DEFINITION  (SHANNON)  CONDITIONAL  ENTROPY  H (CLASS  | v)  • EXP I- 


THEOREM  2 (VERY  EASY)  0 = H (CLASS  v)  5 LOG  N AND— *-0  Iff 
CLASSIFICATION  — 1 
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DEFINITION.  LET  X:  ft  x L — R BE  A STATIONARY  RANDOM  FIELD.  THE  MEAN 
m = FXP  (X  («).  o )l  FOR  ANY  o e L AND  THE  AUTOCOVARIANCE  FUNCTION  IS 

A:L  — »■  R WHERE  A (p)  = EXP  ([x  (w,  p + q I - m ][x  (a),  q)  - ml),  ANY  q e L. 

weft  v ' 

THE  MEAN  AND  AUTOCOVARIANCE  FUNCTION  COMPRISE  THE  SECOND  ORDER  STATISTICS 
OF  A STATIONARY  RANDOM  FIELD. 

WE  CONSIDER  SECOND  ORDER  STATISTICS  BECAUSE  THEY 

1)  COMPLETELY  CHARACTERIZE  (JOINTLY)  GAUSSIAN  RANDOM  FIELDS 

2)  ANY  ALGORITHM  WHICH  YIELDS  A HIGH  PROBABILITY  OF  CORRECT  CLASSIFICATION 
USING  SECOND  ORDER  STATISTICS,  UNDER  THE  ’GAUSSIAN  ASSUMPTION’,  WILL 
ALSO  PERFORM  WELL  IF  THIS  ASSUMPTION  IS  NOT  VALID  (BY  THEOREM  3) 

3)  RESTRICTION  TO  SECOND  ORDER  STATISTICS  REDUCES  COMPUTATIONAL  COMPLEXITY 
IMMENSELY 


JPL  FOURIER  TRANSFORMS  AND  SPECTRAL  ANALYSIS 
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DEFINITION.  F,:L  — -R  AND  Fg  (u,  v).  0 < u < 1,  0 < v < 1 
ARE  FOURIER  TRANSFORM  PAIRS.  Fx  = 0-(F2>,  F?  - ^ -1  < Fj  J.  if 

1 1 

Fj  (a,  b)  = j j F2(u,  v)  EXP  (-27riau  - 27ribv)dudv,  i2  = 
0 0 

F,  (u,  v)  = 2 F,  (a,  b)  EXP  (2irlau  + 2-rribv) 
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DEFINITION.  THE  SPECTRUM  S OF  A STATIONARY  RANDOM  FIELD  X IS 
S (u,  v)  * SF  (A)  (u,  v)  WHERE  A IS  THE  AUTOCOVARIANCE  FUNCTION  FOR  X. 

REMARK.  THE  SPECTRUM  IS  ABSOLUTELY  NOT  THE  FOURIER  TRANSFORM  OF  AN 
IMAGE 
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DEFINITION.  LET  F:L  — R BE  ZERO  EXCEPT  ON  A FINITE  SET  Sc L AND  LET 
X:  ilx  L — ► R BE  A RANDOM  FIELD.  THE  F-FILTER  OF  X IS  A RANDOM  FIELD 
F*X:  Q.  x L — R GIVEN  BY 

F*  X (w,  p)  =2  F(Q>  X(w,  P’fl) 

q«L 

THE  F-FILTER  IS  QUARTER  OR  HALF-PLANE  IF  S LIES  IN  A QUARTER  OR 
HALF-PLANE,  IT  IS  CALLED  S-AUTOREGRESS IVE  IF  (0,  0)  cS  AND  F(0,  0)  = 1 
AND  THE  VARIANCE  OF  F * X (AUTOCOVARIANCE  FUNCTION  OF  F * X AT  (0,  01) 

IS  MINIMIZED  (WITH  RESPECT  TO  ALL  FILTERS  WHICH  ARE  1 AT  (0,  0)  AND  ZERO 
OUTSIDE  S)  THEOREM  4 (CLASSICAL)  SPECTRUM  (F  * X)  - \& (F)  | 2 SPECTRUM  (X) 


JPL  QUARTER  PLANE  AUTOREGRESSIVE  'MODELS’ 


A POPULAR  MODEL  FOR  IMAGE  TEXTURES  CONSISTS  OF  A STATIONARY  RANDOM  FIELD  X 
SATISFYING  THE  FOLLOWING: 

• THERE  EXIST  A QUARTER  PLANE  Q c L,  A FINITE  SUBSET  S c Q,  AND  AN 
S-AUTOREGRESSIVE  FILTER  F SUCH  THAT  THE  AUTOCOVARIANCE  OF  F * X IS 
ZERO  EXCEPT  AT  (0,  0)  cL 

• THE  VARIANCE  OF  F * X IS  MINIMUM  WITH  RESPECT  TO  ALL  T-AUTOREGRESSIVE 
FILTERS  G * X FOR  ALL  T c Q 

THEOREM  5 (EASY)  SPECTRUM  (X)  * tr2/|^(F)  |2 

THEOREM  6 (SEVERAL  COMPLEX  VARIABLES)  THE  FILTER  F IS  STABLE  (ROOTS  OF 
^(F)(Z1(  Z2>  HAVE  MODULUS  > 1) 

THEOREM  7 (ALGEBRAIC  FIELD  THEORY)^  (LOG  (SPECTRUM  (X>>)  (p)  ■ 0 
IF  p 4 Q AND  -p  4 Q 


REMARK.  THESE  MODELS  SHOULD  BE  LESS  POPULAR 


THEOREM  8 (HELSON,  a AND  LOWDENSLAGER,  D.  ).  IF  X IS  ANY  RANDOM  FIELD 
AND  H c L IS  ANY  HALF-PLANE  THEN 


^ ^ SPECTRUM  IF  * X)  — » a2 
S c H S 

S GETS  LARGE 


LOG  (SPECTRUM  (X)  (u,  v))  dudv 


/ 


j (-1,  1)  (0,  1)  (1,  1) 


(-1,  0)  (0,  0) 


(2,  1) 


LATTICE  L 


(1,  0)  (2,  0) 


(-1,  -1)  (0,  -1)  (1,  -1)  (2,  -1) 


1 


u 


;S! 


r ; 


I ; 


& 


-■  i-\ ' , v 1 ■'“T 


«JPIL 


STEP  4 


STATISTICAL  SCENE  SEGMENTATION 

ALGORITHM 


STEP  1.  SELECT  SCENE  - NATURAL  OR  SYNTHESIZED 


STEP  Z SELECT  TRAINING  AREAS  AND  ESTIMATE  MEANS  AND  AUTOCOVARIANCES 


STEP  3.  CHOOSE  A FILTER  WINDOW  S LYING  IN  A HALF-PLANE  AND  CALCULATE 

AUTOREGRESSIVE  FILTERS  F,.:  S— R FOR  EACH  TEXTURE  CLASS  lSkSN 


FILTER  ORIGINAL  SCENE  WITH  FR  TO  OBTAIN  XR  FOR  lgk§N 


STEP  1*  APPLY  TRANSFORMATION  X„— -Yt  • LOG  /2irV„  + (\  - M„\2  k V, 
FOR  LgkiN  K K K ' k K/  / k 


STEP  6.  CHOOSE  BOX  SIZE  L AND  APPLY  L x L BOX  FILTER  TO  Yt  TO  OBTAIN  Z. 


FOR  IsksN 


STEP  7.  CLASSIFY  PIXEL  (a,  b)  INTO  j-th  CLASS  IF-Z-.  (a,  b)  SHEl  (a,  b) 
FOR  ALL  k ¥ j J K 


*Vk  AND  XR  IS  VARIANCE  AND  MEAN  OF  XR  AS  CALCULATED  USING  PARAMETERS 


FROM  STEPS  2 AND  3. 
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ALGORITHM  PROPERTIES 
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• AS  AUTOREGRESSIVE  WINDOW  SIZE  AND  BOX  SIZE  GET  LARGE,  PERFORMANCE 
APPROACHES  BAYESIAN  (OPTIMAL)  CLASSIFIER  FOR  JOINTLY  GAUSSIAN 
TEXTURES 


• PERFORMANCE  STILL  VALID  FOR  NON-GAUSSIAN  TEXTURES,  AND  ALGORITHM 
IS  OPTIMAL  FOR  UTILIZING  SECOND  ORDER  STATISTICS 


® ALGORITHM  ACHIEVES  PERFORMANCE  WITH  MINIMAL  COMPUTATIONAL 
COMPLEXITY 


ALL  THESE  ASSERTIONS  ARE  MATHEMATICALLY  PROVABLE! 
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JPL  EXTENSIONS  OF  THEORETICAL  RESEARCH 


• DEVELOP  OPTIMAL  NONLINEAR  CLASSIFICATION  ALGORITHM  (DONE) 

® EXTEND  MODEL  TO  MULTISPECTRAL  IMAGERY  (DONE) 

• EXTEND  MODEL  TO  SUBSUME  FRACTAL  MODELS  (DONE) 

• EXTEND  MODEL  TO  SUBSUME  STRUCTURAL  MODELS  (DONE) 

• DETERMINE  OPTIMAL  AUTOREGRESSIVE  WINDOW  SIZE,  BOX  FILTER  SIZE,  AND 
THEIR  DEPENDENCE  ON  CONTEXTUAL  KNOWLEDGE  ( INCOMPLETE ) 

• SYNTHESIZE  STABLE  HALF- PLANE  AUTOREGRESSIVE  FILTERS  (DONE) 

AND. UTILIZE  THEM  FOR  OPTIMAL  IMAGE  COMPRESSION  (INCOMPLETE) 
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ABSTRACT 


A Bayesian,  or  penalized  maximum  likelihood,  approach  to  the  problem 
of  estimating  the  parameters  of  a mixture  of  multivariate  normal  distri- 
butions is  proposed.  The  Bayesian  formulation  eliminates  the  problem  of 
singularities  in  the  likelihood  function  and  results  in  an  attractive 
EM-1  ike  procedure.  Although  the  question  of  consistency  is  not  settled, 
it  is  suggested  that  the  proposed  method  has  certain  advantages  over  both 
the  constrained  and  unconstrained  maximum  likelihood  procedures. 


1 . INTRODUCTION 


Let  Xp  ••*,  Xn  be  a random  sample  from  a finite  mixture  density 

m 

f(x|e)  - s q^f^xle^, 

where  the  component  densities  are  d-dimensional  multivariate  normal  and 

m 

the  mixing  propostions  q,  satisfy  q.  a 0,  2 q.  = 1.  We  let  0.  = 

i 1 i=!  1 1 

i,L 

(PpE..)  denote  the  mean  and  covariance  of  the  i—  component  density 

and  let  0 denote  the  aggregate  of  all  the  parameters  involved  in  the 

mixture  density,  including  q = (qp  q^).  We  assume  throughout  that 

m is  known.  It  will  be  convenient  to  consider  also  the  precision  matrix 
1 

t.  = 2.  , and  we  sometimes  let  0-  = (y.,r.). 

Maximum  lieklihood  is  the  method  of  estimating  the  parameters  9 
which  has  recently  attracted  the  most  interest,  [ 81.  According  to  this 

A A 

method,  the  estimate  9 = 9(Xp  Xn)  is  the  parameter  value  which 
maximizes  the  log  likelihood  function 

n 

&{0)  = 2 log  f(X.(e). 
i=l  1 

Unfortunately,  as  simple  examples  show,  the  function  &{e)  is  unbounded, 
and  one  must  consider  local  maximizers  of  £(9)  or  else  modify  A(0)  in 
some  way  so  as  to  produce  a global  maximizer.  Hathaway  [5  ] took  the 
second  approach  in  proposing  a constrained  maximum  likelihood  estimator. 
For  mixtures  of  univariate  normal  densities,  he  developed  an  effective 
computational  procedure  for  finding  a maximum  of  &(0)  subject  to  the 
constraints 


> ‘~l*r 


X I. 

where  cr^.  is  the  i—  standard  deviation,  a = cr^ , and  c > 0 is  a 
constant,  chosen  by  the  user.  He  also  proved  that  Me)  has  a global 
maximizer,  subject  to  the  above  constraints,  and  that  the  global  maximizer 
is  a strongly  consistent  estimator,  as  long  as  the  true  parameter  satisfies 
the  given  constraints.  Redner  C7  1,  mentions  a penalized  likelihood 
function  of  the  form 

Me)  - A ? ||T1||k, 
i-1 

where  A,  k > 0 and  ||t^||  is  a norm  on  symmetric  d*d  matrices. 

Bayes  solutions  for  common  loss  functions,  such  as  quadratic  loss, 
appear  to  be  computationally  infeasible  [3].  For  example,  assuming  that 
the  mixing  propostions  are  the  only  unknown  parameters,  and  using  the 
Dirichlet  prior  distribution  given  in  the  next  section,  there  is  an 
explicit  formula  for  the  Bayes  solution  with  quadratic  loss.  However,  it 
contains  mn  terms  and  is  not  useful  except  for  very  small  sample  sizes. 
The  method  proposed  in  the  next  section  utilizes  a prior  density  g(0) 
of  a certain  form  on  the  parameter  e and  takes  as  the  estimator  the 
mode  of  the  posterior  density 

c S MXjleJUgfe) 

g(e|x1 , ■ • ,xn)  = -il 

Jc  n f(X,|9)]g(e)d8  . 

0 j-i  3 

Equivalently,  the  estimator  maximizes  the  penalized  log  likelihood  function, 

&i(e)  = Me)  + log  g(0)  • 


105 


I 


(T 

L; 


§. 


\-r 

& 


t 

c. 


Such  a procedure  can  be  justified  In  Bayesian  theory  as  being  the  limit 

A 

as  c 0 of  Bayes  solutions  9^  corresponding  to  0-1  loss  functions 


L.(6,9)  = 


( 0 if  i 1 0-0 1 1 < e 
\J  if  1 [9-8[|  & e . 


It  will  be  seen  that  is  similar  to,  but  is  more  elaborate  than 

the  penalized  likelihood  function  suggested  by  Redner. 


2.  THE  PRIOR  DISTRIBUTION 

Recall  that  q = (q^,  •••»  qm)  is  the  vector  of  mixing  propostions 
and  that  0-  = (y.,T^)  is  the  pair  consisting  of  the  mean  vector  and 

j_h 

precision  matrix  of  the  i~  component  normal  density. 
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Assumption  1 : q,  e-j,  *“>  em  are  mutually  independent. 

Assumption  2':  q has  a Dirichlet  distribution  with  hyperparameters 

X*,  aH  > 0.  The  prior  density  of  q is 

r(x,+  + X ) X-,-1  X i -1  A -l 

f0^>  r(X-|)  ***r(xmJ  ql  ’ qnv-l  qm 


rr* 

if. 

Uu 


Assumption  3 : Given  t-,  the  prior  distribution  of  p-  is 

d-variate  normal  N^a- , c^t.)  with  mean  a.  e and  precision  matrix 
c.T_.  where  c.  >0  is  a hyperparameter.  The  prior  distribution  of  t. 

1 1 5 I 1 

is  Wishart  with  v.  > d-1  degrees  of  freedom  and  expected  value  v^h-\ 
where  h^  is  a positive  definite  matrix.  Thus  the  joint  prior  density  of 
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f . (p  .,T  • ) - 

1 VtV  V 


v.  v.-d 


C * *T*  4| 

X exp  {-  ^ (yra.)  - £ trh^J 


The  prior  distributions  given  in  Assumptions  2 and  3 are  the  standard 
conjugate  priors  for  multinomial  probabilities  and  the  parameters  of  the 
normal -Wishart  distribution  of  the  sample  mean  and  covariance.  Cl]. 

Their  use  here  is  for  mathematical  convenience,  rather  than 
because  of  any  prior  conviction  as  to  their  suitability.  However,  it  is 
apparent  that  the  large  number  of  hyperparameters  involved  (X^,  v*,  c^ , 
a.j>  h.j)  allows  a great  deal  of  flexibility  in  applications. 

The  penalized  likelihood  function  corresponding  to  this  prior  is 


i f ill 

M©)  = z log  f(x.|e)  + z X.log  q. 

1 4-1  J 4_1  1 1 


flfo  1 m *r 

+ 2 (v.-d)log}xi  | - 2 .^.(p.j-a.)  x-fp^-a.) 


n Z -T  * • 

4 i=l  1 1 


Here,  we  have  eliminated  terms  which  depend  neither  on  the  parameters, 
nor  on  the  samples  and,  for  convenience,  have  also  replaced  X..  in  the 
original  definition  of  fg(q)  by  X.  + 1. 
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3.  GLOBAL  AND  LOCAL  MAXIMA  OF  ^(9) 

The  prior  density  of  0 given  in  the  preceding  section  is  unbounded, 
as  is  unless  the  hyperparameters  satisfy  X-  s 0,  > d.  There 

fore,  these  restrictions  will  be  assumed  for  the  remainder  of  this  paper. 
The  ordinary  likelihood  function  can  be  obtained  by  allowing  X^  - 0, 

= d,  = o,  h.j  = 0 for  each  i.  This  corresponds  to  a posterior 
distribution  derived  from  an  improper,  non informative  prior. 

Choices  of  the  hyperparameters  which  guarantee  a global  maximizer 
of  &-J (0 ) are  given  in  the  following  theorem. 

THEOREM  1.  If  vk  > d and  hk  is  positive  definite  for  each  k, 
then  2<-j(0)  has  a maximum. 

PROOF:  Since  X..  £ 0, 


n , m 

e.q(e)  s Z log  max  fj(x4|64)  + **■  E (v— dnoglxJ 
1 j=l  i 1 J 1 * i=l  1 1 


1 


m 


- -n  Z trh.x. 
*■  i=1  1 1 


1 n x 

= max  [logjx^  l-(Xj-y^)  t..  (x..~p..)] 

1 ^ 


m 


+ Z [(v.-d)logjx.  1-  trh.-T.]} 
i=l  1 1 11 


For  each  i,  let  C^(0)  = (x  e Rd|  logl^.  |-{x-y -)TT.(x-y.i)  > loglxjJ  - 

(x-uk)TTk(x-uk)  for  each  kl,  let  <^(0)  be  the  number  of  samples  in 
c.{0),  and  let 
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S,(0)  = 2 (x-lJ.^T^K.-p.)* 

1 xjec.(0)  J 1 1 J 1 


Then 


where 


i m 

Me)  £ j l [A- (9 ) log | t . 1 
1 6 1=1  1 1 

A.(e)  = v-d-h^e)  and  B - (e) 

Me)  £ i 2 C(v--d)log| 
1 2 1 


- trB^eJx.] 

= h^  + s-(0)  . 

t.|  - trh-r-] 


+ A 2 [(v.+d+njloglx. I - trh.x-] 
2 1 ill 


Let  n(T.j)  and  p(t^ ) denote  the  largest  and  smallest  eigenvalues  of 
respectively.  If  p(t^)  ->  ~ or  n (t^)  •*  0 for  some  k,  then  the  term 
corresponding  to  xk  in  the  inequality  above  tends  to  -«  while  the 
other  terms  are  bounded.  Therefore,  there  is  an  r > 0 such  that 


sup  £-,(9)  - sup  £,(0)  < where 
0 1 0eGr  1 


9r  = {0  | j ^ n(Tk)  s p(tk)  £ r for  each 


k>. 


Represent  0r  as  Q * ^ * •**  * 

m -i 

2 q,  = 1),  and  i Ji.  = {(u«x*)  | - 
1=1  1 1 i i r 


whese  Q = {q  e Rm|q.  > 0 for  each  i 
< n (r.),  p{x.)  ^ r).  Let  tjT,.  be  the 


and 


one  point  compact! fi cation  of  50  that  ®f  e ip-j  tends  to  «*  if  and 

only  if  llpLjll*-.  If  0.  -*»,  then  f.txjje^  0 for  all  y,  thus. 
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by  allowing  ■<»  as  a value,  ^(e)  can  be  extended  continuously  to 

- Q * Fi  x **  * * iL>  and  has  a maximum  on  that  set,  say  at  F. 
r n rl  rm 


Suppose  F is  a point  at  infinity;  i.e.»  that  for  some  k 

Then  = 0,  because  otherwise  Jl^(F)  = Jl-j(F)  is  obviously  not 
decreased  by  replacing  y^  by  any  finite  value.  Therefore,  &j(9)  is 
maximized  by  a point  in  ©r.  QED. 

Unfortunately,  as  with  other  penalized  likelihood  functions 
the  circumstances  under  which  a consistent  global  maximizer  of  ^(e) 
exists  are  not  known.  Even  if  one  exists  there  is  no  procedure  for  find 
ing  the  global  maximizer.  Therefore,  we  must  consider  local  maximizers. 
The  necessary  conditions  for  a local  maximizer  of  Jl.j(e)  are,  for 


i = 1 , * * * , m: 

(2.2) 


s qlMxjl91* 

j=l  f(xje) 

n + x 
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m 

where  x = 2 X..  , 
i=l  1 


(2.3) 
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ci  * 


E qifi(xjl9i>  X, 

j=t  fmw  * 


E qifi^Xjl0i^ 
3=1  TfxjlFT 


s 


.r:,.  ~r.  - 


no 


(2.4) 


h.  + ci(ii.-ai)(vi-ai)T  + E qifl|xj|ej)(x .-p ,)(x -p. 

i Till!  j=1  f(x“j0}  3 i a i 

J 


2i 


v..-d  + z qif1^xjlei^ 
1 j=l  f(x'Je)  ' 


These  equations  are  the  basis  for  an  EM-like  iteration  procedure  defined 
by  evaluating  the  right  hand  sides  with  the  current  values  of  the  para- 
meters to  obtain  updated  values  of  the  parameters.  Each  of  the  updated 
parameters  is  a convex  combination  of  some  prior  estimate  and  the  EM 
update  for  ordinary  maximum  likelihood  estimation.  Interestingly,  the 
updated  q.  is  a convex  combination  of  the  EM  update  and  the  prior  mode 

of  q.,  whereas  the  updated  E.j  is  a convex  combination  of  the  EM 
update  and  the  prior  conditional  mean 

T 

h.j+c1(}i1.-a.){]i.-a.j) 
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of  E^  given  p.,  not  the  prior  mode.  Obviously,  the  larger  the  sample 
size,  the  greater  will  be  the  weight  given  to  the  EM  updates  and  the 
less  given  to  the  prior  estimates.  When  the  update  equation  (2.4)  for 
is  evaluated  using  the  just  updated  value  of  p.j  in  the  products 
(x.-~p. ) (x --vi- ) and  (p.-a^ }{p^-a. ) * this  successive  substitutions 
procedure  is  equivalent  to  the  modified  EM  procedure  suggested  by 
Dempster,  Laird,  and  Rubin  [ 41  for  finding  posterior  modes.  Hereafter, 
we  shall  refer  to  this  procedure  as  the  generalized  EM  procedure  (SEM). 
The  general  convergence  properties  of  the  GEM  procedure  follow  from 
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[8,  Theorem  4. 13,  more  specifically,  starting  from  any  point  0^  in 

parameter  space,  the  sequence  (0^)}  produced  by  the  GEM  procedure 

k=0 

converges  to  a nonempty,  connected,  compact  subset  of  parameter  space  on 
which  the  penalized  likelihood  £-j(e)  is  constant,  and  on  which  the 
equations  (2-2)- (2.4)  are  satisfied. 

The  next  theorem  assures  that  the  GEM  procedure  converges  to  a 
consistent  local  maximizer  of  k|(e),  given  a good  enough  starting  value. 

THEOREM  2.  If  the  true  parameter  Q is  in  the  interior  of  the  para- 
meter set,  then  there  is  a neighborhood  N of  6 such  that  with  proba- 

A 

bility  1,  if  n is  sufficiently  large  there  is  a unique  solution  0 of 

A __ 

(2.2)-(2.4)  in  N and  0 + 0 as  n ■*  Furthermore,  with  probability  1 

A 

for  large  n the  GEM  procedure  converges  to  0 if  the  starting  point 

A 

near  enough  to  0. 

PROOF.  The  existence  and  uniqueness  of  a consistent  local  maximizer 
is  a consequence  of  a consistency  theorem  due  to  Chanda  [23,  (see  also 
Peters  and  Walker  1 63) - A simple  modification  of  the  proof  of  that 
theorem  shows  that  the  Hessian  d £-j(e)  is  negative  definite  at  9 = 6 
for  large  n.  Therefore,  JS-|(e)  is  strictly  concave  in  a neighborhood 

A A 

of  0.  The  local  convergence  of  the  GEM  procedure  to  6 now  follows 
from  the  consistency  theorem  and  Lemmas  1 and  2 of  [7]. 

4.  OVERMODELED  MIXTURES 

For  mixture  problems  in  which  the  number  of  normal  components  is  not 
precisely  known,  the  present  model  is  not  appropriate  from  a Bayesian 
point  of  view.  However,  it  is  possible  that  the  penalized  likelihood 
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function  exhibits  better  numerical  and  statistical  properties  in  this 
situation  than  the  ordinary  likelihood  function.  To  illustrate,  suppose 
that  the  model  contains  m normal  components,  but  the  true  density  is  a 
mixture  of  k < m normal  components.  Thus, 

k _ 

= .2  Q^f^(x]0.)  (q.  > 0) 

\ / i=l 

is  the  true  density,  and 

m 

f(x  I'W = 41qifi(xiei) 

is  the  model.  Let  the  hyperparameters  for  the  model  satisfy  X.  = 0, 

j 

> d,  c-  > 0,  a.  e R , and  h-  positive  definite  for  1=1,  m. 

A A 

By  Theorem  2,  there  is  a consistent  solution  9^  = s • *0^)  of 

a, 

equations  (2. 2)- (2.4)  for  the  k component  mixture.  Let  q.  = 0, 

A A A 

- h •/( v^-d)  for  i = k+1 , •••,  m,  and  let  9^  = 

(dl » •*>  qm»  0-j,  *•*»  9m).  Clearly  is  a solution  of  (2.2)-(2.4) 

for  the  m component  mixture  which  is  consistent  in  the  sense  that 

A 

f(x|0(n))  as  n =>.  In  contrast,  it  is  not  known  if  there 

is  a consistent  solution  of  the  ordinary  likelihood  equations  in  this 
situation. 


5.  REMARKS  AND  CONCLUSIONS 

The  remarks  at  the  end  of  the  preceding  section  suggest  that  in 
cases  where  the  number  m of  normal  components  is  unknown,  but  a reason- 
able upper  bound  can  be  assumed,  one  should  take  X^  = 0,  v.  > d,  c-  > 0, 
h,.  positive  definite.  Otherwise,  the  choice  of  the  hyperparameters  may 
be  guided  by  prior  guesses  at  location  and  dispersion  of  the  mixture 
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parameters.  For  example 


£(q.j) 


A^l 

__ 


cov(q1#qk) 


(A^+l)(Ak+l) 

(A+m)2(A+2) 


var(q-) 


(A^lJfA-A^m-l) 
(A+m)2(X-i-m+l ) 


can  be  used  to  aid  in  choosing  the  X.j , while  the  equation 


E(z.j)  = ci  varOi.j) 


(provided  > d+1)  can  aid  in  choosing  c^ . 

The  procedures  outlined  herein  may  be  especially  useful  in  applicaticns 
such  as  crop  inventories  from  satellite  data.  There,  spectral  measure- 
ments may  be  sampled  from  a large  ground  area  (segment)  which  is  itself 
chosen  from  a large  number  of  possibilities.  The  normal  mixture  model  has 
often  been  used  for  the  distribution  of  spectral  responses  from  particular 
segments.  Thus  the  parameters  (q,  0-j,  •*,  0m)  can  be  considered 
characteristic  of  segments,  while  the  prior  distribution  of  these  para- 
meters can  reflect  their  variability  among  the  possible  choices  of  seg- 
ments. Since  there  are  "ground  truth"  segments  available  in  which  each 
pixel  has  a known  class  identity,  it  is  possible  that  the  hyperparameters 
of  the  prior  distribution  could  be  estimated  from  the  ground  truth  segments. 

Further  research  into  the  numerical  and  statistical  properties  of  the 
GEM  procedure  is  planned.  The  properties  to  be  studied  include  the 


consistency  of  the  global  maximizer,  the  behavior  of  the  GEM  procedure 

for  overmodeled  mixtures,  and  the  sensitivity  of  the  procedure  to 

starting  values,  for  various  choices  of  the  hyperparameters. 
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ABSTRACT 


Multi-channel  satellite  image  data,  available  as  LANDSAT  imagery, 
are  recorded  as  a multivariate  time  series  {four  channels,  multiple 
passovers)  in  two  spatial  dimensions.  We  consider  here  the  application 
of  parametric  empirical  Bayes  theory  to  classification  of,  and  estimating 
the  probability  of,  each  crop  type  at  each  of  a large  number  of  pixels. 
This  theory  involves  both  the  probability  distribution  of  imagery  data, 
conditional  on  crop  types,  and  the  prior  spatial  distribution  of  crop 
types.  For  the  latter  we  use  Markov  models  indexed  by  estimable 
parameters.  A broad  outline  of  the  general  theory  reveals  several 
questions  for  further  research.  Some  detailed  results  are  given  for  the 
special  case  of  two  crop  types  when  only  a line  transect  is  analyzed. 
There  is  also  a detailed  discussion  of  estimation  of  an  underlying 
continuous  process  on  the  lattice,  which  would  be  applicable  to  such 
quantities  as  crop  yield. 
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1.  Introduction 


Multi-channel  satellite  image  data,  available  as  LANDSAT  imagery, 
are  recorded  as  a multivariate  time  series  (four  channels,  multiple 
passovers)  in  two  spatial  dimensions.  We  consider  here  the  application 
of  parametric  empirical  Bayes  theory  to  estimating  the  probability  of 
each  crop  type  at  each  of  a large  number  of  pixels. 

Parametric  empirical  Bayes  modeling  has  proven  effective  in  various 
spatial  applications  previously.  For  example,  applications  have  been 
made  in  revenue  sharing  (small  census  areas),  insurance  (territories), 
false  alarm  estimation  (neighborhoods  in  a city),  epidemiology  (cities), 
and  forestry  (areas),  as  reported  respectively  by  Fay  and  Herriot  [11] 
and  Morris  [16],  Carter  and  Rolph  [7],  Efron  and  Morris  [9],  and  Burk  and 
Ek  [6].  Such  applications  could  be  further  improved  by  better  use  of 
neighborhood  information,  as  by  use  of  affinity  matrices  (Morris  and 
Kostal  [17]),  based  on  ideas  in  Stein  [19].  Much  more  substantial  ad- 
vances are  needed  for  LANDSAT  data. 

We  define  "Empirical  Bayes  Modeling"  to  be  the  convolution  of  two 
families  of  distributions,  one  for  the  measurement  data  x given  unknown 
parameters  e , 

(1.1)  xje  n,  f(x|e)  0 e Q , 

■v  <v  m 

and  the  second  for  the  unknown  parameters 

(1.2)  e -v  u (0)  , a e a.  . 

•w  ® 

Expression  (1.1)  provides  the  likelihood,  whilst  (1.2)  defines  a family 
It  of  densities  or,  in  the  "Parametric  Empirical  Bayes"  case,  a parametric 


family  indexed  by  cte<z.  The  problem  is  to  make  an  inference  about  e 
from  x,  knowing  a . If  a corresponds  to  all  possible  distributions  on  e, 
then  standard  frequentist  methods  are  appropriate;  and,  if  a.  has  but  one 
element,  then  a pure  Bayes  approach  for  the  single  known  prior  distribu- 
tion is  appropriate.  Different  (empirical  Bayes)  procedures  arise  if  a. 
contains  many  elements,  but  not  too  many.  The  key  is  that  one  can  learn 
about  a by  considering  the  marginal  density  of  the  data 

(1.3)  h (x)  = / f(x[e)  ir  (0)  d 0. 

This  empirical  Bayes  approach  provides  some  conceptual  advantages 
for  LANDSAT  data,  because  it  suggests  that  we  separate  the  satellite 
measurement  process  x and  the  ground  truth  image  process  0.  The  major 
part  of  spatial  correlation  is  in  the  ground  truth  process:  neighboring 

pixels  are  likely  to  have  the  same  crop  type.  The  distribution  (1.1)  of 
the  measurement  process  normally  v/ould  involve  much  less  correlation, 
although  the  theory  does  not  require  this. 

The  empirical  Bayes  approach  accepts  the  fact  that  available  satel- 
lite and  ground  truth  data  (U.S.  sites)  can  be  used  to  estimate  the  func- 
tions f ( • 1 0 ) . Thus,  for  example,  extraction  of  "greenness’1  and/or 
"brightness"  functions  from  the  four  channels  is  a legitimate  operation 
from  LANDSAT  data,  as  are  the  appropriate  time  series  transformations, 
e.g.  "Badhwar  numbers".  However,  the  appropriate  prior  distribution 
na(*)  for  the  ground  truth  (crop  types)  for  a target  site  may  be  very 

different  from  that  it  at  a training  site.  Crop  sizes  and  relative 

“0 

proportions  may  vary  widely  in  different  states  and  countries,  and  thus 
it  would  be  unwise  to  assume  a=a0.  The  empirical  Bayes  approach  then 
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provides  an  appropriate  method  of  estimation,  in  which  (1.2)  enables  us 
to  restrict  drastically  the  probable  spatial  arrangements  on  e. 

M- 

With  these  considerations  in  mind,  in  this  paper  we  analyze  situa- 
tions where  the  density  (1.1)  is  assumed  known,  and  where  particular 
families  ir  (•)  are  chosen.  (For  further  comment,  see  Section  5.) 

ct 

Section  2 concentrates  on  the  relatively  simple  situation  of  dis- 
crete Markovian  (correlated)  e corresponding  to  two  crop  types,  on  a 

A* 

transect,  and  takes  f(»[e)  to  be  known.  We  learn  there  that  the  poster- 
ior log  odds  on  each  crop  type  can  be  approximated  by  a "moving  average"; 
that  the  correct  moving  average  depends  on  o = (Lg,L1),  the  average  lengths 
of  fields  of  type  0 and  type  1;  and  that  a can  be  estimated  without  ac- 
cess to  ground  truth  data.  The  theory  suggests  how  one  might  proceed  in 
more  realistic,  more  complicated  situations.  Section  4 addresses  the 
more  complicated  two-dimensional  situations  with  discrete  e using  Gibb- 

IM 

si an  Markovian  distributions,  as  described  by  Besag  [4]  and  Geman  and 
Genian  [12].  Section  3 considers  related  empirical  Bayes  theory  for  con- 
tinuous autoregressive  ground  truth  parameters  e.  Section  5 contains 
general  remarks,  including  an  outline  of  the  further  research  needed  to 
bring  our  approach  to  the  point  of  providing  viable  software  for  automatic 
processing  of  LANDSAT  data. 

2.  Two-Crop  Models  on  the  Transect 

On  a (lattice)  transect,  as  illustrated  in  Figure  2.1,  let  0 * 

(6i,  e2,  ...)  indicate  the  sequence  of  crop  types,  where  = 0 or  1.  The 
measurement  vector  x^  (possibly  multivariate)  corresponds  to  a LANDSAT 
reading  for  pixel  i.  We  assume  that  x.,  has  density  f ,,.(■>)*  with  both  fo 
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and  fi  known.  For  fixed  e,  the  x^'s  are  assumed  to  be  independent. 

Then,  letting  - log{f i (x^ )/f o ) J be  the  log  likelihood  ratio  for  the 

t h 

i-  pixel,  we  can  write  the  whole  likelihood  function  as 


(2.1)  Lz  (e)  = exp(2  ei  Z.)  , 

ignoring  an  irrelevant  constant  multiplier. 

FIGURE  2.1  Transect  model  notation 


A simple  class  n of  prior  distributions  on  the  binary  crop  indicator 
parameters  is  the  stationary  random  walk  (first-order  binary  Markov  pro- 
cess) with  transition  probabilities 

(2.2)  Pr(ei+1=l|e.)  = p0.  = l-qe  , ei  =0,1. 

This  defines  a two-parameter  family  n indexed  by  pj  = 1-  q^  and  pQ  = 1-  q0. 
Equivalently,  one  may  parameterize  with  LjS  l/pQ  and  Lq=  1/q^  , the  ex- 
pected lengths  of  segments  for  crop  type  1 and  crop  type  0;  or  with 
*1  = Lj/fLj*  Lq)  = 1 - ttq  and  v = 2/(Lq+L1)  the  relative  proportion  of  crop 


"r  t r.V.t;  i.  r~  f : 
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1 (hj  = marginal  probability  of  type  1 at  a given  pixel)  and  the  segment 
intensity  parameter  (nv=  expected  number  of  sigments  in  a transect  of 
length  n). 

The  induced  prior  density  on  e is  proportional  to  the  natural  expo- 
nential  family  density 

(2.3)  expUjNj-  <f>2F),  where 

n 

Nj  = s 0^  - # crops  type  1»  F = 2(0^ -e^+1)  = # segments- 1 

and  = log(pj/qQ) , <j>2  E ^ 1 og (piq0/qiP0) s endpoints  are  ignored  in  (2.3). 
Note  that  Nj  and  F are  not  independent:  F is  restricted  by  the  value  of 

Nj.  Also,  EN-^mrj  and  EF=nv  if  there  are  n pixels  in  the  transect. 

The  posterior  density  of  0,  given  both  x and  the  unknown  (p0,  Pj)  s 
a,  is  obtained  by  multiplying  (2.1)  and  (2.3),  and  has  logarithm  (ignor- 
ing  an  additive  constant  that  does  not  depend  on  e) 

(2.4)  £(0  jx» a)  = Z 01- (Zf  + 4*2)  - *2Z  (0i  " 91+1)2' 

The  empirical  Bayes  viewpoint  acknowledges  that  <j>p  ^ are  unknown, 
but  that  the  marginal  distribution  of  may  be  used  to  estimate  them. 

For  example,  consider  the  case  of  normal,  homoscedastic  measurements, 

, 0C),  for  which  Z.j  = s(x.j  - ij)/cr  with  6 - (yj  - y0)/c  and  y = 

(yQ+yjJ/a.  Assuming  n pixels,  define  Z=  il^/n  and  r^  s £ (Z^j  - Z)  • 

(Z^  - Z)/n  to  be,  respectively,  the  mean  and  j-  autocorrelation.  With 
respect  to  the  marginal  distribution  of  Z , 

(2.5)  EZ  = 6 (ttj  — .5) , Er j = S TTg  v j (p^  “ Pq)  9 j - 1,  2,  . ..  . 

One  then  can  estimate  tt^  and  Pj  “ Pq»  hence  a,  from  Z and  r-^  by  equating 
to  expectations,  that  is  without  ground  truth  data.  The  autocorrelations 


r2  * r3  * * ' can  *3e  usec*  t0  c^iec*t  or£ler  of  Markov  model . One  way 
to  develop  a parametric  empirical  Bayes  estimate,  which  is  satisfactory 
for  large  samples,  is  to  use  the  estimated  S,  in  (2.4),  and  proceed  in 
a Bayesian  fashion. 

There  are  at  least  two  approaches  to  the  use  of  (2.4).  Various 
authors  have  recommended  choosing  e to  maximize  (2.4),  subject  to 
8.j  =0  ,1  . We  have  taken  a different  approach,  using  (2.4)  to  develop 
a formula  for 

P(0j  - l)  Data,  a)  =?^  =1  -PQi  . 

An  approximation  to  the  exact  formula,  good  in  the  normal  case  if  6 is 
small  or  moderate,  is  the  logistic  spatial  moving  average 


(2.6) 

with 


* WoVWl<Zi-l+W  + w2<Zi-2  + Zi*2 > + -sr? 


(2.7)  W0  = l,  Wj  .(Pj -P0)J  , j = 1,  2 

The  P^  value  determined  in  this  way,  with  Pj  and  pQ  estimated 
from  (2.5),  is  a parametric  empirical  Bayes  estimate  for  the  required 
probabil ity. 

One  possible  approximation  to  the  0*  which  maximizes  (2.4)  would 

-wr 

* 

be  to  set  6.  “1  if  P^-  > h , otherwise  zero.  This  is  equivalent  to 
setting  0^  =1  if  W'Z  > logfiTg/Tr^)  = 1 ogfpg/q^ ) . The  scanty  numerical 
work  we  have  done  thus  far  does  not  contradict  the  near  equivalence  of 
the  two  methods.  If  this  approximation  is  good  it  would  improve  any 
complicated  algorithm  required  to  maximize  (2.4);  see  Section  4.  Note 
that  the  maximization  problem  (2.4)  can  be  thought  of  in  non-Bayesian 
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terms  as  maximizing  the  likelihood  function  £0^  subject  to  two 
constraints,  with  0 and  with  a penalty  coefficient  limiting 
the  number  of  segments. 

The  weight  w.  may  be  estimated  by  W.  = r . , with  5. 

J J J U A * 

the  estimate  for  ir^  , in  the  normal  distribution  case.  This  may  be  more 
robust  than  (2.7)  if  the  actual  Markov  process  for  6 has  order  higher 
than  one. 

Formula  (2.6)  develops  probabilities  from  the  "spatially  moving 
average". 


(2.8)  9M,  = W0Z,  ♦ £ W.(Zj+J  -Z,...)  . 

Many  proposed  spatial  estimation  methods  have  been  based  on  (2.8),  with 
Z-+J.  + replaced  by  more  complicated  forms  in  two  dimensions. 

Switzer  ([20])  suggested  estimating  the  W,.  by  discriminant  analysis, 
or  logistic  regression,  using  ground  truth  data.  This  is  unsatisfactory, 
from  the  empirical  Bayes  viewpoint,  if  the  true  crop  distribution 
characteristic  (LQ,  L^)  in  the  site  to  which  the  formula  is  to  be  applied 
differs  substantially  from  the  characteristics  pertaining  to  the  available 
ground  truth  data. 

The  empirical  Bayes  viewpoint  is  that  the  weights  Wq,  Wj,  . . . should 
be  estimated  from  the  marginal  distribution  of  the  data  in  the  targeted 
area.  Discriminant  analysis  will  be  satisfactory  only  if  ground  truth 
patterns  are  similar  for  training  and  application  sites. 


• ' si 
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3.  Empirical  Bayes  Modelling  with  Spatial  Autoregressive  Priors 

Parametric  empirical  Bayes  applications  are  more  common,  and  more 
easily  developed,  in  situations  with  continuous  parameters.  Such  rules 
also  provide  moving  average  estimates  of  the  form  (2.8).  Our  earlier 
work  (Morris  and  Kostal,  [17J  ) studied  both  cases  where  the  weights  W. 

J 

are  partially  determined  by  "affinities",  independent  of  the  data,  and 

cases  where  localized  shrinkage  factor  estimators  were  obtained  with  the 

W.  determined  separately  for  each  pixel.  Each  of  those  approaches 
J 

resulted  in  pixel-dependent  moving  averages.  In  this  section  we  extend 
the  earlier  results  to  include  parametric  empirical  Bayes  estimators 
which  derive  from  explicit  spatial  models. 

We  suppose  that  on  a pxq  lattice  there  is  a mean  process 
iuu  : 1=1,  ...»  p;  j=l,  ...  , q}  , which  is  not  directly  observed.  For 
example,  p. . might  be  the  yield  of  a particular  crop  at  the  pixel 

' J 

labeled  (i,j)  . Corresponding  to  each  p. . is  an  observation  x^ 

obtained  from  LANDSAT  imagery  data.  Our  Bayesian  model  then  consists  of 

two  parts:  a distribution  for  the  observations,  namely  that  the  x.^ 

o 

are  independent  N(p..  , a ) , and  a structural  prior  distribution 

* J 

(3.1)  p ~ Nn  (m,t2A)  . 

Here  p is  a vector  version  of  the  n = pq  means  p.. . . 

The  posterior  distribution  for  p is 

(3.2)  p|X  ^ Nn((I-B)X  + Bm,  (I-B)a2)  , 
where  I is  the  nxn  identity  matrix  and 

(3.3)  B = a2 (a2 I + t2A)_1  . 
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The  Bayesian  estimate  of  y is  taken  to  be  the  posterior  mean.  The  whole 
posterior  distribution  may  be  used  for  inference  or  discrimination,  as 
appropriate  to  the  context. 

The  empirical  Bayes  approach  presumes  that  while  we  accept  the  form 
of  the  structural  distribution  (3.1),  the  parameter  values  are  not 
known  a priori.  The  observation  vector  X , through  its  marginal 
distribution 

(3.4)  X -v  Nn(m,c2I  + t2A)  , 

2 7 

provides  information  on  a - (m,  t ,A)  . Usually  a is  taken  to  be 

•‘v 

known,  as  in  Section  2,  or  independently  estimable. 

Some  knowledge  of  the  structure  of  m and  A involving  a limited 
number  of  parameters  is  necessary  for  (3.4)  to  be  useful.  The  mean  m 
may  be  a regression  surface,  m = V3  , or  a trend  surface,  m^  being  a 
polynomial  function  of  the  coordinates  i and  j . More  complex  spatial 
models  for  m involve  describing  fields.  Since  our  objective  in  this 
section  is  to  describe  methods  for  estimating  the  unknown  variance  matrix 
A , we  shall  take  m - 0 . Thus  the  posterior  mean  of  (3.2)  will  be  of 
the  form 

(3.5)  y = (I-B)X  . 

A spatial  autoregressive  distribution  arises  from  the  simultaneous 


model 

(3.6)  Vjj  - pVu  + i'i.j-i  + ■‘i+i. j + **1  ,j+r>/4  + eij 
with  appropriate  modifications  on  the  boundary  of  the  lattice, 
e ^ N(0,  x2I)  , and  |p|  < 1 to  ensure  stationarity  (Cliff  and  Ord, 

[ 8]  ).  Similar  models  have  been  considered  by  Besag  ( [5]'),  who  uses  a 
conditional  analog  of  (3.6),  and  by  Ord  ([18]),  in  non-Bayesian  contexts.;. 
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Hence,  with  C an  n*n  matrix  determined  by  (3.6),  p = pCp  + e , or 
y = (I-pC)_1e.  Thus  in  (3.1)  we  have 
(3.7)  A = (I-pO^d-pC")"1  » 
which  depends  only  on  the  parameter  p . 

Since  (3.4)  implies  that  E(XX')  = a2I  + t2A  , Haff  ([.13]),  in  the 
context  of  autoregressive  priors  for  time  series,  suggested  estimating 
B in  (3.2)  and  (3.3)  by  B = a2  {tXX'  + (l-t)M}'1  , where  0<t<l  and 
M is  an  a priori  value  for  o I + x A . This  artifice  is  required 
because  XX'  is  singular.  We  follow  instead  the  more  standard  practice 
of  obtaining  maximum  likelihood  estimates  for  x and  p (Whittle 


[21]). 

We  follow  the  suggestion  of  Herzberg  (in  the  discussion  of  Barlett 
[ 3]  for  design  matrices,  and  now  write  C in  terms  of  Kronecker 
products 


(3.8)  C = Ip®Cq  + Cp®  Iq> 


where  Ir  is  the  rxr  identity  matrix  and  Cr  depends  on  how  (3.5)  is 
modified  at  boundary  points.  If  we  let  = 0 in  (3.5)  when  either 
i£{l,...,p}  or  j£{l,  ...  , p} , then  will  be  the  rxr  matrix 
with  h appearing  oh  the  first  upper  and  lower  off-di agonal s , 
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Then  = PrApP'  » where  Pr  is  orthogonal  and  Ar  is  a diagonal 
matrix,  so  that  (3.8)  may  be  written  as 

C = (Pp®PqXlp<S)Aq  + Ap<E>Iq)(Pp©Pq)'  = PAP'  . 

Now  it  is  easy  to  show  that  a2I  + t2A  = PrP'  where  r = a2I  +T2(I-pAr2 . 
Hence  |a2I +t2A|  « Ily. . = n{a2  +T2(l-pAilO"2>  - 

I J I J 

The  log  likelihood  arising  from  (3.4)  is  then 

JI(t2,  p;  Y)  = -%<Z  log y-j  +2  Z^/y^)  , 

2 

where  Y=  P'X  . The  partial  derivatives  of  £ with  respect  to  t and 
p , when  equated  to  zero,  give  the  likelihood  equations 
2(1 -PA^.)  2 (Yij-Y2.j)Y1j2  = 0, 

(3.9) 

St2  A^tl-pA^O  3 (Yij  -V2j)Yij2  = 0* 
where  y. . = a2  + x2(l-p\- •)  2 . 

I J * J 

Note  that  Pr  , Ar  , and  so  Z and  A , can  be  easily  found  (Anderson 

\ 2 
pij).  The  numerical  solution  of  (3.9)  for  ? and  is  obtained 

using  ap  iterative  procedure  such  as  Newton-Raphson  or  the  method  of 

scoring.  Reasonable  initial  values  are 

(3.10)  pQ  =0  and  f2  = X'X/pq  - a2  . 

The  computed  values  of  x2  and  0 are  then  used  to  find  A and  H 
in  (3,7)  and  (3.3),  leading  to  the  estimate  (3.5)  for  p . 

These  results  are  a special  case  of  results  in  Kostal  ([15])  which 
include  other  autoregressive  structures,  spatial  moving  averages,  space- 
time  processes,  and  non-zero  m. 

A 
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4.  Harkov  Prior  Models  and  Their  Use 

The  line  transect  analysis  in  Section  2 involves  a special  case  of 
a general  class  of  Markov  prior  models  for  crop  codes  0 . In  this 
section  we  outline  the  general  class,  and  discuss  its  use  in  construction 
of  Bayes  and  empirical  Bayes  estimates  for  8 . A similar  development 
would  be  possible  in  the  somewhat  different  context  of  Section  3, 
providing  an  alternative  to  models  such  as  (3.6). 

We  begin  by  noting  that  the  first-order  binary  Markov  model  on  the 
transect,  as  in  (2.2)  or  (2.3),  can  be  written  in  two  other,  equivalent 
forms.  Again  let  e«(e1f  ...  ,6j  and  write  0^  = {e. 

^ * n j 

Then,  for  some  normalizing  constant  K„  depending  on  a = (8,y)  , 

(4.1)  Pr(6)  » «n(a)  exp (-820.  -y20.0.+1) 
and 

(4.2)  Pr^leM)*  Pr^le^j  .e1+1>  « exp(-B0i  - y(0i_101  +0^^)}  . 

where,  in  the  notation  of  Section  2,8  = and  Y = ~2<|>2  . 

simple  boundary  modifications  of  (4.2)  are  needed  at  i =l,n  . The 
formulae  (4.1)  and  (4.2)  correspond  respectively  to  a Gibbs  distribution 
and  its  local,  conditional  representation,  both  of  which  are  capable  of 
wide  generalization  to  accomodate  the  more  complex  situations  with  which 
we  wish  to  work. 

The  general  structure  illustrated  by  (4.1)  can  be  described  as 
follows  (Besag,[4]  and  I sham,  [14]  ).  Suppose  that  0.  depends, 
stochastically,  on  values  of  0^  at  the  neighboring  pixels,  jeN(i)  , 
in  the  sense  that  Pr{0.  j0^)  = Pr(e.j  [0^  , jeN(i))  ; for  example,  we 
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see  in  (4.2)  that  N(i)  = {1-1, i+1}  except  at  the  boundaries,  where 
N(l)  =•  {2}  and  N(n)  = {n-1}  . Next,  define  a set  C of  pixels  to  be 
a clique  if  C contains  a single  pixel  or  if  all  pairs  of  pixels  in  C 
are  neighbors  of  each  other  in  the  system  of  neighborhoods  N( • ) ; for 
example,  with  (4.2)  the  list  of  sets  C is  {1}  ,{2}  ,...  , {n}  ,{1,2}  , 
{2,3}  , ...  ,{n-l,n}  . Then  the  general  form  of  Markov  model  for  6 is 

>*v 

(4.3)  Pr(0)  = Kn(ct)  exp{-Zdc(9c  ,a)}  , 

c 

wherein  dr(*)  depends  on  0 only  through  0«  = {0.  :j<=c}  ; for 

U J 

example,  in  (4.1)  d{i}(en.  ,a)  = 30.  and  d{iji+1}(0.  ,0i+1  ,a)  =Y0i0i+1  . 
Corresponding  to  (4.3)  is  the  local,  conditional  model 


(4.4) 


Pr^l©^)  = Pr(0i \Q.  ,jSN(i)) 


exp  < - 2 

___l_C : i e C 


Eexp  < ■*  Z 
s l C:iEC 


(e,,e 

l ~ 
C (s  5 0 


where  0^  = {0^  :j€C  , jjM}  ; equation  (4.2)  is  an  example. 

Both  (4.3)  and  (4.4)  can  be  used  for  discrete  or  continuous  values 
of  0..  , so  long  as  the  normalizing  constant  Kn(a)  exists.  Also,  the 
pixel  label  i can  be  two-dimensional,  so  that  general  lattice  models 
are  included.  Since  the  function  dc( * ) can  depend  on  C , it  is  also 
possible  to  define  nonhomogeneous  models  for  0 within  this  Markov 

■"V 

framework. 

By  way  of  illustration,  we  outline  the  two-dimensional  analog  of  our 
earlier  binary  transect  model.  Ignoring  boundary  effects,  suppose  that 
the  neighborhood  N(i)  of  pixel  i = (j,k)  is  the  set  of  four  closest 
pixels,  labelled  (j-l,k)  ,(j+l,k)  , (j,k-l)  , (j,k+l)  . Then  the  set  of 
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cliques,  C,  consists  of  all  single  pixels  (j,k)  , all  pairs 
{(j,k)  , (j,k+l)>  . Now  take  the  following  simple  forms  for  dc(»)  : 


C 

{(j»k)} 

{(j >k)  , (j+l,k)} 
{(j»k)  , (j,k+l)} 


3eJ,k 

Yl0j,kej+l,k 

Y26j,kej,k+1 


The  conditional  representation  (4.14)  then  gives  the  autologistic  form 


!l1): 

pr(ejjk=°|eTTT) 


Pr(e,  t=l|8n)) 

(4-5>  log"  ‘ ■ ,(iT7  ■ + ■Y2<8j,k-l+ej,k+l) 


which  is  isotropic  if  Yj  = Y2 


Recall  that  these  Markov  prior  models  for  0 are  intended  to  model 
the  spatial  correlation  that  exists  among  crop  codes,  this  correlation 
being  a stochastic  way  of  describing  the  phenomenon  of  fields,  within 
which  crop  codes  are  identical.  To  illustrate  the  kind  of  patterns  that  can 
be  generated  by  the  models,  we  simulated  the  autologistic  model  (4.5)  on 
a 10x10  lattice  (with  fixed  boundary)  with  6=2,  = Y2  = and 

obtained  the  pattern  in  Figure  4.1. 
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FIGURE  4.1  Simulated  map  of  binary  crop  codes  using 
model  (4.5)  with  , Yj  ~Y2  =~1  • 


To  return  to  the  general  development,  our  aim  is  to  use  models  such 

as  (4.3)  as  priors  in  conjunction  with  the  sampling  model  for  satellite 

data  X = (X,,  ...  , I ) , so  as  to  obtain  Bayes  or  empirical  Bayes 
*1  n 

inferences  about  crop  codes  0 . We  assume  the  Xj  to  be  conditionally 
independent,  with  joint  density  IIf(x.|0.)  . Then,  by  Bayes's  Theorem, 

J J 


(4.6) 


Pr(0|X  =x)  =Kn(x,a)*exp{-  Zdc(0c,a)  + 

c 


£ log  f (xi |0^)> 


= K (x,a)*exp{-Edr(x,0,a)}  , 
n CL 

say,  where  dr(x,0,a)  = dr(0r,c)  except  when  C is  a singleton 

I,/  m «w  Lr  ~ 

C = {i}  , in  which  case 
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= d{i >^ei  " lo9  f I ei ) * 

The  normalizing  constant  Krt(x,a)  in  (4.6)  is  the  reciprocal  of  the 
marginal  likelihood  of  a , from  which  we  would,  in  principle,  estimate 
a using  data  x . 

-w  -w 

Equation  (4.6)  reduces  to  (2.4)  for  the  binary  transect  model  of 
Section  2,  and  empirical  Bayes  inference  proceeds  as  described  there. 

In  general,  however,  it  is  difficult  to  use  (4.6)  even  to  obtain  the 
posterior  mode  for  0 when  the  prior  parameters  a are  specified. 

-V 

Geman  and  Geman  ([12] ) present  an  ingenious  iterative  "relaxation- 
annealing" algorithm  for  obtaining  this  posterior  mode,  taking  advantage 
of  the  fact  that  the  mode  is  invariant  under  scale  change  of  the  expo- 
nential family  (4.6).  The  algorithm  does  not  yield  marginal  posterior 
probabilities  Pr(e.|x,a)  , for  which  the  direct  approach  of  Section  2 
led  to  logistic  spatial  moving  averages  in  the  binary  transect  case. 

At  the  present  stage  of  development,  several  important  questions 
remain  to  be  answered  before  the  promising  Markov  prior  approach  yields 
a clean  empirical  Bayes  algorithm  for  image  reconstruction  from  LANDSAT 
data.  In  the  next  section  we  discuss  these  questions  briefly  and  indicate 
where  further  research  is  going. 

5.  Discussion  and  Outline  of  Further  Research 


We  have  emphasized  the  importance  of  modeling  the  underlying 
structure  of  fields,  where  crop  types  occur  in  pixel  clusters.  In  order 
that  new  geographical  areas  be  amenable  to  adaptive  statistical  analysis, 
empirical  Bayes  methods  are  needed.  Use  of  the  class  of  Markov  priors 


for  the  crop- type  "map"  0 leads  us  to  the  following  questions: 

(i ) How  can  we  determine  the  order  of  the  Markov  prior  model?, 

(ii)  Can  one  usefully  add  an  "edge"  distribution,  as  described  by  Geman 
and  Geman  ([12])  ? (iii)  What  methods  or  algorithms  are  possible  for 
estimating  the  parameters  a which  determine  distribution  (4.3)  ? , 

( i v)  Can  we,  and  need  we,  estimate  Prte^jx)  in  addition  to  calculating 
the  posterior  modal  estimates  of  0.  ? 

To  take  these  in  reverse  order, we  note  first  that  the  advantage  of 
having  Pr(0.|x)  is  that  aggregate  characteristics,  such  as  proportion 
of  area  covered  by  a particular  crop  type,  could  be  more  efficiently 
estimated.  We  conjecture  that  generalizations  of  (2.6)  will  prove 
useful.  On  question  (iii),  one  possibility  is  to  use  some  type  of  EM 
algorithm  (with  0 as  the  missing  value  vector)  embedded  in  the 
relaxation -annealing  algorithm  referred  to  in  Section  4.  There  are 
three  points  to  bear  in  mind  here:  (a)  the  iterative  algorithm  is  slow, 

especially  in  the  context  of  the  simple  model  of  Section  2, 

(b)  likelihood  estimation  of  a from  0 is  non-trivial  (see. Besag  [4] ) . 

<V  <W 

(c)  marginal  correlation  properties  of  x , as  used  in  Section  2,  are 
made  difficult  on  the  lattice  by  the  poor  understanding  of  correlation 
properties  for  0 (Bartlett  [2])  . 

On  question  (1),  standard  contingency  table  methods  would  be 
applicable  if  results  for  ground- truth  data  could  be  extrapolated. 
Large-scale  ground-truth  data  are  available  for  the  U.S.  and  will  be 
analyzed. 
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Abstract 


This  paper  is  concerned  with  the  use  of  spline  functions  in  the 
development  of  classification  algorithms.  In  particular,  a method  is 
formulated  for  producing  spline  approximations  to  bivariate  density 
functions  where  the  density  function  is  described  by  a histogram  of 
measurements.  The  resulting  approximations  are  then  incorporated  into  a 
Bayesian  classification  procedure  for  which  the  Bayes  decision  regions  and 
the  probability  of  misclassification  can  be  readily  computed.  Some 
preliminary  numerical  results  are  presented  to  illustrate  the  method. 


^ef. 


§1.  Introduction 


This  paper  is  a continuation  of  our  earlier  work  [13]  on  the  use  of 
spline  functions  as  a tool  in  statistical  pattern  classification.  Whereas 
in  [13]  we  dealt  only  with  univariate  problems,  our  aim  here  is  to  develop 
methods  which  are  suitable  for  2 or  more  dimensions. 

The  main  mathematical  tool  to  be  used  here  is  the  tensor-product 
splines  {see  Section  3 below).  In  particular,  we  show  how  splines  can  be 
used  to  estimate  multivariate  conditional  density  functions  for  the 
classes  of  interest.  Using  tensor-product  B-splines,  we  then  develop 
efficient  algorithms  for  finding  the  associated  classification  regions. 
Moreover,  we  also  show  how  to  compute  the  probability  of  misclassification 
associated  with  the  classification  method. 

The  paper  is  divided  into  8 sections.  In  Section  2 we  discuss  the 
general  Bayes  classification  procedure.  In  Section  3 we  introduce  the 
tensor-product  splines  and  discuss  their  use  in  the  general  problem  of 
density  estimation.  In  Section  4 we  present  a specific  method  for 
estimating  densities  based  on  biquadratic  splines.  The  problems  of 
computing  the  related  classification  regions  and  the  probability  of 
misclassification  are  treated  in  Sections  5 and  6,  respectively.  We 
close  the  paper  with  examples  and  remarks. 

§2.  The  Bayes  Classification  Procedure. 

Suppose  that  some  group  n of  objects  can  be  divided  into  NC  classes 
which  we  will  denote  by  Now  suppose  that  we  are  trying 

to  decide  which  class  a given  randomly  selected  object  belongs  to  on  the 
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basis  of  d measurements  which  have  been  taken  on  the  object.  In 
particular,  suppose  X is  a mapping  from  R = u .*•  U into  Rd 

such  that  if  w e n,  then  X(w)  = (Xj x^)  Is  the  measurement  taken 

an  w.  Finally,  suppose  that  for  each  i = we  know  the  apriori 

probability  a.  that  an  object  will  fall  in  class  JI..  and  that  we  also 

know  the  conditional  density  function  P,-  associated  with  measurements 

taken  from  the  i-th  class. 

Given  this  stochastic  framework,  the  Bayes  optimal  classifier  is 
defined  as  follows: 

Assign  an  element  w to  the  ith  class  ji.  if  and  only 
if  its  measurement  X{w)  belongs  to  the  set  R.. , 

where  R^,..,,R^  are  the  Bayes  decision  regions  defined  by 
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(2.1) 


R^  * {x£R°  : a-P^(x)  > ajPj(x)  "For  all  j * i } 


The  numerical  problem  of  identifying  the  Bayes  decision  regions  is  ^ 

equivalent  to  finding  the  boundaries  of  the  sets  R. . These  in  turn  are  ' J 

defined  by  the  equations  a.P..(x)  - ct. P- (x)  = 0 for  i,  j = 1,...,NC.  [ 

It  J J 

There  are  several  well-known  ways  of  measuring  the  quality  of  the  ^ 

Bayes  classification  scheme  described  above.  One  convenient  way  is  to 
compute  the  probability  of  mi sclassifi cation  (pmc)  (cf.  [1,2])  defined  by  j 


rti 
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NC 

(2.2)  G = 1 - / . max[a.P.{x)]dx  « 1 - I a.  L P^xjdx  . 

Rd  i 11  i=l  1 Ki  1 

In  general,  the  evaluation  of  the  pmc  G is  a difficult  problem  since  it 
involves  integration  over  irregularly-shaped  regions  in  d-space. 

To  apply  the  Bayes  classification  procedure  in  a practical  setting, 
the  following  steps  need  to  be  carried  out: 

1)  estimate  NC  « number  of  classes, 

2)  estimate  the  a priori  probabilities 

3)  estimate  the  density  functions  Pj P^q, 

4)  estimate  the  decision  regions 

5)  estimate  the  value  G of  the  pmc. 

In  this  paper  we  shall  concentrate  on  steps  3)  - 5),  For  some  methods 
dealing  with  problems  1)  and  2),  see  [15,24]. 

§3.  Estimating  Densities  Using  Splines. 

In  this  section  we  discuss  the  problem  of  estimating  a multivariate 
probability  density  function  on  the  basis  of  a finite  number  of 
measurements  taken  on  the  underlying  random  variable.  This  is  a standard 
statistical  problem,  and  there  are  many  parametric  and  non-parametric 
methods  available  (see  e.g.  [4,6,7,11,15,21-24],  Since  in  many 
applications  the  densities  of  interest  are  not  standard  parametric 
densities,  we  focus  on  non-parametric  methods,  and  in  particular  on 
methods  which  are  based  on  tensor-product  splines. 


There  are  several  compelling  reasons  for  selecting  splines  to 
approximate  density  functions.  These  include  among  others  (cf.  [8,19]) ‘ 

1)  splines  are  easy  to  store  and  manipulate  in  a digital  computer,  2) 
there  are  stable  efficient  algorithms  for  evaluating  splines  as  well  as 
their  derivatives  and  integrals,  3)  splines  are  smooth  but  at  the  same 
time  flexible,  and  4)  splines  are  capable  of  approximating  smooth 
functions  to  high  orders  of  accuracy. 

Our  method  for  constructing  a spline  s approximating  an  unknown 
density  function  P on  the  basis  of  a finite  number  of  measurements  will 
proceed  in  two  steps: 

1)  use  the  data  to  construct  a histogram, 

2)  approximate  the  histogram  by  a spline. 

The  problem  of  histograming  data  has  been  the  subject  of  considerable 
study  by  statisticians.  Hence,  throughout  the  remainder  of  this  section 
we  concentrate  on  step  2,  assuming  that  the  histogram  has  been 
constructed. 

Before  we  can  proceed  any  further,  we  need  to  introduce  some 
notation.  In  order  to  avoid  undue  notational  complications,  we  shall  now 
restrict  our  discussion  to  the  case  of  two  dimensions;  i.e.,  d = 2. 

For  the  extension  to  d > 2,  see  Remark  3 . 

We  begin  by  introducing  some  parameters  to  describe  the  histogram 
which  we  wish  to  fit.  Let  nbx  and  nby  be  positive  integers  denoting  the 
number  of  bins  in  the  x-  and  y-directions,  respectively.  Suppose  that 
xtj  < ...  < xtnbx+1  and  ytj  < ...  < ytnby+r  Let  h.j  be  nonegative 

real  numbers.  Then  the  function 
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, if  xtj  <_  x < xt1+1  and  yt . ±y  < yt^+1 

with  1 <_  i £ nbx  and  1 < j <_  nby 
, otherwise 

represents  the  histogram  we  want  to  fit* 

We  now  introduce  B-splines.  We  begin  with  univariate  B-splines. 
Suppose  that  yxj  < ...  < yxnx+mx  is  a set  of  real  numbers,  where  nx  and 

mx  are  positive  integers.  Then  associated  with  these  points,  there 

exists  a set  of  mx-th  order  normalized  B-splines  N^x(x),...,N^x(x)  with 

the  properties  (cf.  [8,19]): 

N?x{x)  is  a piecewise  polynomial  of  order  mx  with  join  points 
(knots)  located  at  the  points  yx^ ,,..,yx1+mx; 

N?x(x)  has  mx-2  continuous  derivatives  on  R; 

N?x(x)  is  positive  on  (yx^  »y*-j+mx)  and  vanishes  elsewhere; 

N?x(x)  can  be  computed  efficiently  and  accurately. 

Suppose  my  and  ny  are  also  positive  integers,  and  that 
N™y(y  ),..•>  Njjjj(y)  are  similar  B-splines  associated  with  a knot 
sequence  yy^  < ...  < yyny+my»  We  now  define  tensor-product  B-splines  as 
fol 1 ows : 

(2.4)  N^’lny(x,j')  = N^WN^ty)  , 1 = 1 nx  and  j - 1 ny. 


hij 


(3*1)  h(x*y)  = 


-^g'  ' ‘ • PC' 


144 


fi 


r. 


r 


f ! 


The  knot  sequences  yxx . «yxnXlffllx  and  yyny*,,,,yyny+«ny 


divide 


the  rectangle  ft  = [yXj  »yxnx+mx!I  * Cyyj  »yyny+rfly3  into  subrectangles 

fifj  ■ [yx^ ,yxi+1)  x [yy^ ,yyj+1)>  i = l,...,nx+mx-l  and  j - ny+my-1. 

It  follows  from  the  properties  of  the  univariate  B-splines  that  the 
tensor-product  B-splines  have  the  properties: 


Nmx,my(Xjy) 

is  a polynomial  of  order  mx  in  x and  order  my  in  y on 


each  subrectangle  ft  , v = l,..,nx+mx-l,  y = ny+my-1; 


Nmx,my  e cmx-2,my-2(R2j. 


Nmx,my(Xty)  > 0 for  all  (x#y); 


Nmx,my 


U (x‘y>  > 0 for  yxi  < x < yxi  +mx  and  yyj  K y < yyj+my’ 

Figure  1 shows  a view  of  a biquadratic  B-spline  (mx  = ny  = 3). 

Our  aim  is  to  approximate  histograms  (and  thus  the  underlying 
densities)  by  a linear  combination  of  the  tensor-product  B-splines  of  the 
following  form 


(3.2) 


s(x,y)  * 


nx  ny  m„  mu 

ll  c- . N??»my(x,y) 
i=l  j=l  1J 


To  construct  an  approximation  as  in  (3.2),  we  must  select  the  orders 


mx,ny,  the  knot  sequences  yx  and  yy,  and  finally  the  coefficients  c. .. 

" J 


In  the  following  section  we  discuss  how  to  compute  these  coefficients. 
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1.  A biquadratic  tensor-product  spline 


§4.  A Biquadratic  Spline  Density  Estimator. 

In  this  section  we  cfiscuss  a specific  density  estimation  algorithm 
based  on  biquadratic  splines.  First  we  list  the  input  and  output. 
Input: 

nbx  = number  of  bins  in  the  x-di recti  on 

nby  = number  of  bins  in  the  y-di recti  on 

xtj  < ...  < the  bin  edges  in  the  x-direction 

ytj  < ...  < ytpky+i*  the  bin  edges  in  the  y-di recti  on 

h,.,  i = nbx  and  j = l5...,nby  , the  histogram  values. 

* J 

Output: 

mx  = order  of  the  spline  in  the  x-direction 

my  = order  of  the  spline  in  the  y-di recti on 

nx  - number  of  B-splines  used  in  the  x-direction 

ny  = number  of  B-splines  used  in  the  y-di recti on 

yx,  < ...  < yx„wJ_  . the  knots  in  the  x-direction 
1 nx+mx 

»i  < •••  <ww the  ltnots  in  the  y-d1pect1on 

c..,  i = l,...,nx  and  j = l,...sny,  the  coefficients  of  the  spline 

• J 

Algorithm  4.1.  {Biquadratic  spline  density  estimation). 

1.  Set  mx  = my  = 3. 

2.  Set  nx  = nbx  and  ny  = nby. 

3.  Set 

yxj  = xtj  - (xt2  - xtj) 

yxi  - xt^_1#  i = 2,..,nbx+2 
yxnbx+3  = xtnbx+l  * ^xtnbx+l“  xtnbx^ 


- " ~ ri r ** - ir,  at  t-i 
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4.  Set 


yyx  = ytx  - (yt2  - yt^, 
yyi  = yti-l  5 1 = 2»***»nby+2» 


nby+3  ~ ytnby+l 


+ t^nby+l  - ^nby>- 


5.  Compute  the  c..'s  which  solve  the  linear  system  of  equations 

* J 


nx  ny 


l l c UJ 


i=l  j=l 


/ 11+1  N?(x)  N?(y)  dxdy 


Vxtv+1  ’ - ytu)  ■ 


for  v = l,...»nx  and  \i  = l,...,ny, 


Discussion:  This  algorithm  produces  a biquadratic  spline  which  belongs  to 


11  2 

C * (R  ) and  whose  support  is  on  the  rectangle  [yx^^ ,yxnbx+;j]  * 


[yyj »yynby+3].  This  rectangle  is  slightly  larger  than  the  support  of  the 
histogram  which  is  the  rectangle  H = ^xti»xtnbx+l^  x ^-ytl*ytnby+l^* 


The  system  of  equations  in  step  5 can  be  arranged  in  matrix  form  as 


follows: 


(4.1) 


ix  c *T  = z , 


where  C . Z 


/ \ nx,  ny 


(4.2)  z.^  = hij(xti+i  ' xtiHytj+l  ' s 1 = 1»‘*,,nx 


j = l,...,ny 


Here  $ is  the  nx  by  nx  matrix  with  entries 

A 


(4.3) 


tV  Ik  * 1 


xti+l  .,3 


N^(x)  dx  s i jk  = !)*••  jHXj 
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and  $ is  the  analogous  ny  by  ny  matrix.  The  entries  of  these  matrices 

Jr 

can  be  generated  by  standard  B-spline  methods  (cf.  [19,Corollary  5.18]). 

It  is  not  hard  to  show  (see  Lemma  4.2  below)  that  the  matrices  $x  and 

$ are  tridiagonal,  nonsingular,  and  totally  positive. 

Following  deBoor  [9],  we  may  solve  the  system  (4.1)  in  two  steps: 

1)  Solve  $ U = Z for  the  nx  by  ny  matrix  U, 

X 

2)  Sol ve  = UT  for  the  nx  by  ny  matric  C. 

In  carrying  out  step  1)  it  is  desirable  to  first  compute  the  LU 
decomposition  of  $ , and  then  to  compute  the  columns  of  U by  back- 

A 

substitution  using  one  column  of  Z at  a time.  Because  of  the  bandedness 
of  o , the  decomposition  can  be  done  with  special  software  designed  for 

X 

banded  matrices.  Because  of  the  total  positivity,  the  decomposition  can 
be  done  without  pivoting  (cf,  [8])..  For  appropriate  linear  algebra 
packages,  see  the  FORTRAN  subroutines  BANDET  and  BANSLV  in  [8].  The  same 
comments  apply  to  step  2). 

The  technique  suggested  above  is  more  than  simply  a convenient  way  to 
arrange  the  solution  of  the  system  of  equations  in  step  5)  of  the 
algorithm  — the  approach  takes  essential  account  of  the  tensor-product 
nature  of  the  problem  and  results  in  major  savings  in  storage  and 
operation  counts.  In  particular,  in  equation  (4.1)  we  need  only  store  Z 
and  the  three  diagonals  of  the  matrices  and  $y.  The  total 

operation  count  for  the  algorithm  is  0(nx  + ny  ). 

The  equations  in  step  5)  of  this  algorithm  are  precisely  the 
conditions  that  the  volume  under  the  spline  surface  in  bin  vy  should  be 


exactly  equal  to  the  volume  of  the  histogram  in  bin  vy,  all  v = l,...*nbx 
and  y = l,...,nby. 

Lemma  4.2.  The  nx  by  nx  matrix  defined  in  (4.3)  is  tridiagonal  * 

' A 

nonsingular,  and  totally  positive. 

Proof:  The  tridiagonal  nature  of  <&„  is  obvious  from  the  support 

" A 

properties  of  the  B-splines.  In  particular,  because  of  the  choice  of 
knots,  the  only  B-splines  with  nonzero  values  in  the  interval 

o 3 3 

[xt^xt^j]  for  1 < i < nx  are  the  B-splines  N^,  and  N.+^.  (On  the 

first  and  last  intervals  only  two  B-splines  have  nonzero  values). 

We  turn  now  to  the  assertion  of  total  positivity.  Suppose  that  we 
select  1 <,  Vj  < ...  < Vj,  £ nx  and  1 < < ...  < y^  < nx  with  1 <,  k _<  nx. 

We  need  to  show  that  the  determinant 


D( 


'l  5 * < 

V" 


•)  = 


xt 

xt 


'v.+l 

det  ( /, 


vi 


N^(x)dx 


xt 


xt*  •••  f xt,k  ^det(Ny.(?i))i=xj=i  ^^...dc 


xt 


V. +1 


is  nonnegative.  But  by  the  total  positivity  of  the  B-splines  (see  e.g. 
[19,Theorem  4.65]),  the  determinant  in  the  integrand  of  this  multiple 
integral  is  nonnegative  for  all  i^<  ...  < Since  the  intervals  over 

which  the  integration  is  performed  are  in  increasing  order,  the  total 
postivity  assertion  follows. 
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Finally,  the  nonsingularity  of  $x  follows  from  the  fact 
(cf.  [19, Theorem  4,65])  that  the  determinant  in  the  integrand  of 

D(i  ****«!!)  is  a continuous  function  which  is  strictly  positive  for 

certain  values  of  the  5's  in  the  intervals  over  which  the  integration 
takes  place.  □ 


§5.  Computing  The  Bayes  Decision  Regions. 


In  this  section  we  discuss  the  problem  of  computing  approximations  to 
the  Bayes  decision  regions  in  the  case  d = 2.  Since  in 


general,  we  do  not  know  the  densities  Pp...,PN£,  it  is  natural  to 


approximate  the  Bayes  regions  by  using  the  approximating  splines  s^ 
in  place  of  the  in  the  definition  (2.1)  of  the  R...  Thus,  we  define 


(5.1)  R.  = {x  eR2  : a- Sj  (x)  j>oijSj(x},  all  j * 1}  , i = 1,..,NC. 

When  the  equality  a-js^x)  = ajsj(x)  holds,  we  put  x in  the  set 

It 

R.  provided  i is  the  least  integer  j for  which  a-P.(x)  = o.P.(x). 

* * j J 

The  boundaries  of  the  decision  regions  are  contour  lines  defined  by  the 
equations  6..(x)  = o.P. (x)  - a.P.(x)  = 0. 

I J II  J J 

The  problem  of  locating  the  0-level  contours  of  the  5..  can  be 

* J 

handled  by  any  standard  contouring  algorithm  (see  [20]  for  a review  of 
several  methods).  We  now  present  an  algorithm  which  is  particularly 
efficient  for  our  applications  since  tensor-product  splines  can  be 
evaluated  very  efficiently  on  grids. 


Algorithm  5.1.  (To  find  the  zero  contour  of  a spline  surface  s): 


1.  Choose  positive  integers  ngx  and  ngy. 

2.  Choose  xgx  < ...  < xgngx  and  ygj  < ...  < ygngy. 

3.  Compute  s^.  = sfxg^  »ygj)»  i = l,...,ngx  ; j = 1 ngy, 

4.  Triangulate  the  grid  by  drawing  in  upward  sloping  diagonals  (see 
Figure  2). 

5.  Identify  the  edges  of  the  tri angulation  whose  endpoints  have 
opposite  signs. 

6.  Use  inverse  interpolation  to  compute  an  approximate  crossing 
point  for  each  of  the  edges  in  step  5). 

7.  Order  the  points  on  each  contour  and  thread  a curve  through  them. 

In  practice  one  would  normally  choose  the  xg's  and  yg's  to  be  equally 
spaced.  The  choice  of  ngx  and  ngy  controls  the  resolution  of  the 
contouring  process  — large  values  of  these  parameters  will  give  a fine 
grid  and  correspondingly  better  resolution.  Step  3)  of  the  algorithm  can 
be  made  very  efficient  for  tensor-product  splines  (the  algorithm  can  be 
used  on  any  function  s which  can  be  evaluated  at  the  corners  of  the 
grid).  Step  7)  can  be  accomplished  using  straight  line  interpolation  if  a 
contour  consisting  of  a polygon  (with  sharp  corners)  is  acceptable. 
Otherwise  it  is  recommended  that  a parametric  spline  curve  (possibly  with 
some  tension)  be  used.  Steps  6)  and  7)  can  be  eliminated  altogether  if 
one  is  willing  to  accept  a polygonal  contour.  In  particular,  if  C is  a 
contour  of  s,  we  can  replace  it  by  the  boundary  of  the  set  D,  where  D is 
the  smallest  union  of  rectangles  drawn  from  the  set 


k 'W 


(5.2)  {[xgi  »xg1+1]  X Cygj,ygj+1]  : 1<1 <ngx , l<j<ngy} 

such  that  C £ D.  Figure  3 shows  a typical  situation  when  a smooth  contour 
has  been  replaced  by  a polygonal  one. 

§6.  Computing  The  Probability  Of  Misclassification. 

Suppose  once  again  that  s^ are  spline  estimates  of  the 

conditional  probability  densities  Pp...,P^  associated  with  a classifi- 
cation  problem.  Suppose  in  addition  that  are  the 

approximate  Bayes  decision  regions  defined  in  (5.1).  Then  an  estimate 
* 

G for  the  pmc  G defined  in  (2.2)  can  be  computed  as  follows: 

* NC 

(6.1)  G = 1-  I «,/*  s • (x)dx  . 

i=l  7 Ri  7 

As  already  observed  earlier,  it  is  possible  to  integrate 
tensor-product  splines  exactly  over  rectangular  sets.  Thus,  if  we  replace 

the  in  (6.1)  by  R^,  where  B".  is  the  smallest  union  of  rectangles  drawn 
from  the  set  D in  (5.2)  such  that  R we  can  compute  all  of  the 

the  integrals  in  the  expression 

NC 

G = 1 - jo.  / s.(x)dx 

i=l  In- 
exactly. The  estimate  can  be  made  arbitrarily  close  to  G by  taking  a 
sufficiently  fine  grid. 


LZiWLa 


If  the  s.'s  are  nonnegative  (as  they  are  supposed  to  be,  although 
In  practice  they  may  miss  by  a little  ~ see  Remark  4 ),  then  since  £ 

'fc 

TT.j  for  all  i,  we  concluded  that  ££G  . We  can  obtain  an  upper  bound 

for  G by  replacing  (for  all  i)  R.  by  a the  largest  IR^  which  is  a union 
of  rectangles  drawn  from  the  set  (5.2)  and  which  satisfies  R.  £ R’. . 

§7.  Numerical  Results 

The  methods  discussed  above  have  been  implemented  in  FORTRAN,  and  we 
have  begun  a testing  program  utilizing  both  known  distributions  and  actual 
Landsat  data.  Figure  4 shows  a standard  bivariate  normal  distribution 
with  mean  (1,1)  and  variance  (.04, .04).  A sample  of  500  points  was  drawn 
from  this  distribution  using  a random  number  generator,  and  a histogram 
with  bin  width  (.25, .25)  was  constructed.  The  result  of  fitting  this 
histogram  with  a biquadratic  spline  is  shown  in  Figure  5.  A comparison  of 
the  views  clearly  indicates  a very  good  fit  has  been  achieved,  although  it 
should  be  noted  that  the  spline  in  Figure  5 does  take  small  negative 
values  at  some  points. 

Our  second  example  involves  actual  Landsat  data.  Figure  6 shows  a 
histogram  which  was  generated  from  500  samples  taken  from  Channels  1 and  4 
of  an  agricultural  scene.  Figure  7 shows  that  the  corresponding  spline 
fit  is  very  good,  despite  the  complexity  of  the  histogram. 


7.  The  spline  fit  to  the  histogram  in  Figure  6 
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§8.  Remarks 


X.  The  idea  of  fitting  a spline  to  a histogram  in  such  a way  that  the 
area  of  the  histogram  in  each  bin  matches  the  corresponding  area  of  the 
spline  is  due  to  Boneva,  Kendall,  & Stefanov  [6],  Schoenberg  [18]  showed 
how  the  idea  could  be  carried  over  to  bivariate  histograms  (see  also  [7]). 

2.  The  usual  approach  to  fitting  splines  with  area  matching  (cf.  [7,18]) 

is  to  compute  a cubic  spline  which  fits  the  cumulative  histogram,  and  then 

to  take  its  derivative.  In  the  bivariate  setting  one  may  fit  a bicubic 

spline  S to  the  cumulative  histogram,  and  then  the  desired  density  fit  is 

given  by  s = DO S.  In  Section  4 we  have  presented  an  algorithm  which 
x y 

works  directly  with  the  biquadratic  B-splines  and  the  original  histogram. 

3.  Our  discussion  in  this  paper  has  concentrated  on  the  case  of  two 
dimensions.  Except  for  some  notational  difficulties,  there  is  no 
theoretical  problem  with  carrying  over  the  methods  presented  here  to 
higher  dimensional  cases.  There  are,  however,  some  practical  problems. 
Clearly  as  the  dimension  d increases,  the  storage  requirements  for  the 
parameters  of  the  histogram  and  spline  increase  correspondingly.  The  cost 
of  solving  the  analogous  linear  system  to  (4.1)  also  increases  with  d,  but 
the  tensor  technique  of  deBoor  [8,9]  can  still  be  used.  Additional 
difficulties  arise  in  connection  with  finding  the  Bayes  regions  and  in 
computing  the  pmc.  In  particular,  the  Bayes  regions  now  become  subsets  of 
d-dimensional  space,  and  their  boundaries  are  sets  in  d-1  dimensional 
space.  We  are  currently  planning  experiments  with  d = 3. 
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4.  The  volume-matching  algorithm  presented  in  Section  4 is  not  guaranteed 
to  produce  a spline  which  is  nonnegative.  (Figure  5 shows  a typical 
example  where  s takes  on  small  negative  values  at  some  points).  This  is 
not  an  ideal  situation  since,  after  all,  s is  supposed  to  be  approximating 
a density  function.  There  are  several  approaches  to  adjusting  s to  obtain 
nonnegativity.  For  example,  one  can  simply  replace  all  negative  values  by 
zero.  Alternatively,  one  can  add  small  positive  multiples  of  selected 
B-splines  to  achieve  nonnegativity.  We  are  currently  exploring  such 
post-processing  schemes. 

5.  Another  approach  to  achieving  nonnegative  spline  density  estimates  is 
to  determine  the  coefficients  of  the  spline  as  the  solution  of  some 
constrained  optimization  problem  with  side  constraints.  This  is  the 
approach  which  we  used  in  [13]  in  the  univariate  case.  A similar  approach 
can  be  carried  out  in  the  bivariate  case,  but  the  resulting  linear 
programming  problem  involves  a very  large  tableau  and  hence  requires 
considerable  storage  space  as  well  as  computational  time.  We  are 
currently  exploring  special  optimization  methods  for  approximation  by 
tensor-product  splines  which  will  take  advantage  of  the  tensor  nature. 
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ABSTRACT 


The  method  of  determining  asymptotic  confidence  bands  for 
autoregressive  spectra  due  to  Newton  and  Pagano  [3]  is  extended  to  the 
case  of  data  observed  in  the  plane.  One  Quadrant  Autoregressive  Models 
are  used  as  a basis  for  the  method. 


1 . INTRODUCTION 
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The  analysis  of  data  observed  over  a regular  grid  of  points  in 
multi-dimensional  space  has  become  an  important  problem  in  recent  years; 
particularly  in  the  area  of  remotely  sensed  data  such  as  data  observed 
by  satellite.  Viewing  such  data  as  a realization  from  a time  series 
having  a vector  index  set  representing  the  location  of  the  observations, 
one  seeks  to  extend  the  results  for  scalar  index  set  to  the  more  general 
case.  The  purpose  of  this  paper  is  to  extend  the  method  of  one 
dimensional  autoregressive  spectral  confidence  bands  to  the  two 
dimensional  case.  The  extension  to  more  than  two  dimensions  is 
straightforward. 


2.  TWO  DIMENSIONAL  TIME  SERIES 


We  say  that  the  collection  of  random  variables  {Xf  ,t,T=0,+,l ,+2> 

1 5r 

...}  is  a (weakly)  stationary  two  dimensional  time  series  if 
i)  E(Xt  ) - u Vt,x 

1^3  T 

ii)  There  exists  a function  R(called  the  autocovariance  function 
of  X)  having  integer  valued  arguments  such  that 
R(s,u)  = Cov  (Xt>x,Xt+SiT+u)  W,t  . 

An  important  function  for  modeling  and  interpreting  two  dimensional 
time  series  is  the  spectral  density  function  f of  X which  is  given  by 
(if  it  exists)  the  two  dimensional  Fourier  Transform  of  R: 


f(u.\) 
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Given  data  (Xt  ,f>,  T=1,...,n2  from  a zero  mean  series  X, 

we  may  estimate  (without  assuming  any  parametric  model  for  X)  R by  the 
sample  autocovariance  function 


*(s'u)  * njn^  J,  Xt,Txt+|s|,T+|u|  • lsl<nl>M<n2’  I2-') 


n,-|s|  n2-|u| 


T 


and  f by  the  windowed  periodogram  spectral  estimate 


n,-l  n9-l 

fM  - (4)2  l 


W L , , l , k(s,u)R(s,u)e~1Sul‘1ux 

s=-(nrl)  u=-(n2-l) 
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where  the  periodogram  I of  X is  given  by 
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and  k is  a suitably  chosen  lag  window,  while  the  spectral  window  K is 
given  by 


«-.»)  * (w)2  l 


n-j-1  n2-l 
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In  the  one  dimensional  case  an  alternative  model -oriented  method  of 
estimating  f is  to  assume  that  X can  be  adequately  approximated  by  a 
causal  autoregressive  model  in  which  case  f can  be  expressed  as  a 
function  of  a few  parameters  which  can  be  well  estimated  from  the  data. 
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the  idea  of  modeling  an  observation  at  time  t as  a linear  function  of 
finitely-many  previously  observed  values  is  intuitively  appealing  when 
the  index  set  is  time.  However,  defining  the  causal  analogue  for  spatial 
series  is  notoriously  difficult  (see  Whittle  [6]  and  Besag  [1]  for 
example)  as  there  is  no  natural  ordering  of  the  data.  TjiJ>stheim  [4], 

[5]  has  investigated  analytically  and  numerically  the  efficacy  of  what 
are  called  one-quadrant  autoregressive  models  as  approximating  models 
for  X.  In  this  paper  we  adopt  this  approach  and  find  simultaneous 
confidence  bands  for  the  spectral  density  function  of  such  a process. 


3.  ONE  QUADRANT  AUTOREGRESSIVE  PROCESSES 


The  two  dimensional  time  series  X is  called  a one  quadrant  auto- 
regressive process  of  orders  p-j,  p2  if 


P1  p2 

I l a(j,k)  Xt  . R =et 

j=0  k=0  z J’T  K 


(3.1) 


where  a(0,0)=1  and  e is  a two  dimensional  white  noise  time  series,  i .e. 
the  e's  are  zero  mean,  uncorrelated  random  variables  with  common 
variance  a2.  We  write  X QAR  (p-j  ^ja.cr2) . Tj<|>stheim  [5]  discusses 
sufficient  conditions  for  a spatial  series  to  be  representable  as  a QAR 
and  suggests  that  the  class  of  processes  so  representable  is  quite  large. 

If  the  complex  valued  polynomial 
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satisfies  g(zrz2)=0^  jz^l  , |z2|>l  then  the  QAR  is  called  stable  and 
X is  stationary.  Also  one  has  Yule-Walker  type  equations  (obtained  by 
multiplying  both  sides  of  (3.1)  by  Xt-r>T_s  for  r,s>0,  taking  expectations 
on  both  sides,  and  noting  that  is  uncorrelated  with  xt_fiT_s  unless 
r and  s are  both  zero) 

P1  p2 

l l a(j,k)R(j*-s,k-s)  = fi  fi  o2  , r,s>0  (3.2) 

3=0  k=0 


where  5V  is  the  Kronecker  delta,  i .e.  is  one  if  V=0  and  zero 
otherwise.  Also,  the  spectral  density  of  X is  given  by 


f(w,x) 


1 


Pi  P2 

l l a(j,k) 
j=0  k=0 


-i  jui-ikA 


2 


(3.3) 


Solving  (3.2)  for  r=0,...,p^  and  s=0,...,p,,  with  the  R‘s  defined  by 

A A 

(2.1)  replacing  the  R's,  one  can  obtain  estimators  a2  and  a(j,k)  which  can 

A 

be  inserted  into  (3.3)  to  obtain  an  estimator  f of  f.  Justice  [2] 
describes  a Levinson  type  algorithm  for  efficiently  solving  (3.2) 
recursively  for  varying  values  of  p^  and  p2.  Tj<|)stheim  [5]  discusses 
methods  of  optimally  choosing  the  orders  p-j  and  p2  but  we  shall  derive  our 
inferences  contingent  on  having  adequately  chosen  them. 


4.  ASYMPTOTIC  CONFIDENCE  BANDS  FOR  QAR 

In  the  one  dimensional  setting,  the  reciprocal  of  the  auto- 
regressive spectral  density  is  a linear  combination  of  a finite  number 
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of  parameters  that  can  be  estimated  by  asymptotically  normal  estimators. 
Thus  Scheffe  projections  can  be  used  to  find  asymptotic  confidence  bands 
on  the  entire  autoregressive  spectral  density  (see  Newton  and  Pagano 
[3]).  In  this  section  we  describe  the  analogous  procedure  for  a 
AQR(p-|,p2,a,02)  process. 

There  are  p=(Pi+l)(p2+l)-l  a's  to  be  estimated  (a(j,k)  for  j=0,..., 
p-j , k=0»...,p2,  but  a(0,0)=l)„  Define  the  vec  operator  on  an  nxm  matrix 

A having  columna  a^ am  to  be  the  process  of  forming  the  nm*l  vector 

a = vec  (A)  by  a = (aj,. . . ,a^)^.  Let  tvec  (A)  be  the  result  of  removing 
the  top  element  of  vec  (A). 

Let  A and  R be  (p-j+1)  (p2+1)  matrices  having  (j,k)th  elements 

a(j,k),  and  R(j,k),  j=0,...,p-|,  k=0,...,p2  respectively.  Then  one  can 
write  the  Yule-Walker  equations  (3.2)  in  amtrix  form  as 

R a - - r 

where  a = tvec  (A),  r = tvec  (R),  the  first  p1  rows  of  R correspond  to 
(3.2)  for  s=0  and  r=l,...,p-,,  the  next  p-j-s-l  are  for  s=l  and  r=0,. . . ,p-j , 
and  the  last  p^+1  are  for  s=p2  and  r=0,...,p-j.  To  form  R explicitly,  note 
that  a and  r can  be  partitioned  into  a p-|*l  vector  a(0),  r(0)  followed  by 
P2  vectors  a(1 ) , . . . ,a(p2) ; r(l ) , . . . ,r(p2)  each  of  length  p-|+l.  Thus  R can 
be  partitioned  into  a matrix  containing  p2+1  rows  and  columns  of  blocks. 
Call  the  (i  ,j)th  block  C(i,j),  i ,j=0, . . . ,p2-  Then  C(0,0)  is  p^;  C(0,j) 

is  p-jxfp^+l),  j=l p2;  C(j,0)  is  (p-j+1)  x p^ , j=l,...,P2;  while  for 

j,k>l,  C(j,k)  is  (p^+lMp^l).  Thus 
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C(j»k)  a(k)  = -r(j) 


j— Oj  > • • »Pg  ♦ 


A careful  inspection  of  this  equation  shows  that 
^£m(0»0)  " R(m-i,0)  > £,ni“0,...,Pi“l 

Ctm(Ofj)  = CM(j,0)  = R{m-«.-j  »j)  , j=l,...*p2 

f,=0»...,p-j-l 

m-0  y • » » ) P ’1 

CtmU#k)  = R(m -a,  k-j)  , j,k=l,...,p2 

A ,m~0 , . • • , p^ 

and  in  fact  R is  the  block  Toeplitz  matrix  having  (j,k)th  block  G(k-j) 
where  Gam( v)  = R(m-a,v)  , M<p2,fi.,m=0,...,p.p  except  that  the  first 

row  and  column  have  been  removed.  Thus  to  form  R one  needs  R(r,s)  for 
M<Pr  |s|<P2. 

Theorem  (Tj^stheim  [4]} 

A A.  A 

Let  a be  the  solution  to  the  Yule-Walker  equations  (3.2)  R a = -r 

A A A 

where  R and  r are  the  same  as  R and  r with  R(j,k)  replacing  R{j,k). 
Suppose  X is  a stable  QAR(p|,p2,a,a2)  where  the- white  noise  series  e is 
of  independent,  identically  distributed  random  variables.  Then  as 
i=l,2,  we  have  that 

- V i 

/njn^  (a-a)  ->  Nd(0,ct2R  l) 
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{recall  that  p = (p-j+1 ) (p2+l  )-l ) . 

Now  the  reciprocal  of  the  QAR  spectral  density  is  given  by 


fzU,A)  = 


1 

f(w.x) 


(2tt  )^/a2 
(2-rr) 2 


Pi  P2 

l l 

j=0  k=0 


a(j,k)  e 


-i  jui-ikx 


which  is  the  spectral  density  function  of  a two  dimensional  moving  average 
process  Z of  orders  p.j  and  p2  having  coefficients  a(j,k)  and  noise  variance 
(2ir)^/az  (See  Tj<j)stheim  [4]).  Thus 


P1  P2 

Zf  = I I a(j,k)  n . . 

t>T  j=0  k=0 


where  n is  a white  noise  process  having  variance  (2ir )^/a2.  Now  clearly 

Rz(r,s)  = Cov  (zt,T,Zt+r,T+s^  15  ier0  for  lrl>pl  and  whlle  Rz^r,s^ 

is  a complicated  function  g of  the  a's  and  a2,  i.e  Rz(r,s)  = g(a,02). 

Thus 


Mu,*)  = p l l Mr>s)  e 

1 M2  I r[ <p-j  |s[<p2  Z 


-irw-isA 


= — ~—n  l l R7(r,s)  [Cos  ru  Cos  sA-Sin  rwSin  sa] 

(2*7  |r|<p,  |s|<p2  1 

which  is  a linear  function  of  Rz(r,s)  for  | r | <p , |s|<p2.  Since  a and  a2 
are  asymptotically  normal  by  the  above  theorem,  we  have  that  the  Rz(r,s)  = 
g(a,o2)  are  also  asymptotical ly  normal  and  one  can  use  Scheffe  projections 
to  get  simultaneous  confidence  bands  on  fz  and  thus  on  f.  It  remains  to 
obtain  convenient  expressions  for  the  asymptotic  covariances  of  the  Rz  and 
to  determine  the  best  way  to  display  the  bands  graphically. 
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ABSTRACT 


Satellite  estimates  of  agricultural  characteristics  often  are  not 
sufficiently  precise  for  reliable  use  in  small  geographical  regions-  The 
precision  of  estimates  of  agricultural  characteristics  such  as  crop  pro- 
portions and  leaf  area  indexes  can  be  increased  by  modelling  ground 
observations  as  a function  of  satellite  estimates.  Linear  regression 
models  using  least  squares  estimators  of  the  model  parameters  is  most 
often  advocated  as  an  appropriate  methodology;  however,  least  squares 
estimation  requires  that  the  predictor  variables  are  measured  without 
error,  an  unreasonable  assumption  for  this  application.  An  alternative 
estimation  methodology  which  assumes  that  both  the  response  variables 
(ground  observations)  and  the  predictor  variables  (satellite  estimates) 
are  measured  with  error  involves  the  use  of  linear  structural  models.  In 
this  paper  the  application  off  linear  structural  models  to  the  estimation 
of  agricultural  characteristics  using  satellite  spectral  measurements  is 
examined. 
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1 . Introduction 


Satellite  remote  sensing  ia  an  important  technology  for  rapid 
collection  and  processing  of  spectral  information  on  agricultural  and 
vegetation  characteristics.  Estimation  of  crop  acreage  and  biomass  are 
but  two  of  the  many  potential  applications  of  satellite  remote-sensing 
technology.  However,  the  precision  needed  for  reliable  estimation  of 
agricultural  and  vegetation  characteristics  often  is  not  obtainable 
solely  from  satellite  estimates,  especially  for  geographical  regions  as 
small  as  counties.  For  this  reason  ground  observations  from  selected 
sample  locations  are  often  used  in  conjunction  with  satellite  spectral 
measurements  to  obtain  estimates  for  geographical  regions  of  interest.  In 
this  paper  the  use  of  linear  structural  models  to  obtain  estimates  of 
agricultural  characteristics  from  ground  observations  and  satellite 
measurements  Is  investigated. 

Linear  structural  models  assume  that  a variable  of  interest,  the 
response  variable  (Y),  is  a linear  function  of  another  measurement,  a 
stochastic  predictor  variable  (X): 

(1.1)  Y *=  a + BX  . 

In  addition,  both  the  response  variable  and  the  predictor  variable  are 
assumed  to  be  measured  with  error;  i.e.,  x and  y are  observable,  where 

(1.2)  x * X -i-  u and  y ^ Y + v* 

Classical  linear  regression  models  assume  that  the  error  in  x is  zero 
(i.e.,  u =>  0)  or  at  least  negligible  and  that  the  predictor  variables  are 
known  constants. 
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Although  ground  observations  can  be  expected  to  have  less  error  than 
estimates  obtained  from  satellite  spectral  measurements,  both  ground 
observations  and  satellite  estimates  are  subject  to  measurement  error. 
For  example,  there  are  many  sources  of  measurement  error  in  the  cal- 
culation of  leaf  area  indexes  from  ground  observation:  trees  must  be 
felled  and  the  leaves  collected,  weighed,  and  their  individual  areas 
calculated.  Likewise,  satellite  spectral  measurements  are  subject  to 
several  sources  of  error  including  registration  error,  randomness 
associated  with  the  selection  of  segments  and  pixels  with  which  to  train 
classifiers,  and  technician  error  in  the  identification  of  pixels.  Thus 
structural  models  in  which  both  the  response  and  predictor  variables  are 
assumed  to  be  subject  to  measurement  error  present  a more  realistic 
framework  from  which  to  estimate  many  agricultural  and  vegetation 
quantities  from  satellite  measurements. 

In  Section  2 of  this  paper  the  theoretical  properties  of  maximum 
likelihood  estimators  of  the  parameters  in  linear  structural  models  which 
assume  independent  normal  distributions  for  X,  u,  and  v are  outlined  and 
conditions  under  which  these  estimators  reduce  to  least  squares  esti- 
mators are  noted.  In  Section  3 the  dependence  of  linear  structural 
estimators  on  knowledge  of  the  ratio  of  error  variances  (i.e., 
var ( v)  / var(u) ) is  assessed.  Two  applications  of  this  methodology  are 
discussed  in  Section  4.  Section  5 contains  concluding  remarks  and 
mentions  several  extensions  which  are  currently  under  investigation. 
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2.  Linear  Structural  Models 


Either  of  two  assumptions  can  be  added  to  the  linear  model  defined 
by  equations  (1.1)  and  (1.2)  to  define  the  nature  of  the  true  (unobser- 
vable) predictor  variable  X.  If  the  true  values  of  the  predictor 
variable  X are  assumed  to  be  constants,  the  model  is  referred  to  as  a 
linear  functional  model.  If  the  true  values  of  the  predictor  variable 
are  assumed  to  be  stochastic,  the  model  is  referred  to  as  a linear 
structural  model.  The  focus  of  this  investigation  is  on  linear  struc- 
tural models  for  which  the  measurement  errors  and  the  predictor  variable 
X are  assumed  to  be  independent  normal  random  variables: 

(2,1)  X **  N(|i}£,0x)3  u~N(0 ,(JU)  and 

Together  equations  (1*1),  (1.2),  and  (2.1)  constitute  the  linear  struc- 
tural model  of  interest  in  this  work* 

Under  the  assumptions  stated  above,  the  joint  distribution  of  a random 
sample  of  n observations  is  bivariate  normal;  i*e., 

» {(X 

The  maximum  likelihood  estimating  equations  for  the  parameters  in 
the  joint  distribution  (2.2)  are: 


i=l ,2, . . . ,n 


(2.3) 
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where  5 and  J?  are  sample  means*  and  are  sample  variances, 

x y 

and  sXy  is  the  sample  covariance.  There  are  six  model  parameters  which 
must  be  estimated  from  these  five  estimating  equations;  equivalently  , 
there  are  five  sufficient  statistics  from  which  to  estimate  the  six 
model  parameters.  Without  (a)  knowledge  of  one  or  more  model  para- 
meters,  (b)  replication,  or  (c)  the  availability  of  one  or  more  addi- 
tional variables  ("instrumental  variables")  which  are  correlated  with  X 
the  regression  coefficients  in  (1.1)  cannot  be  estimated  consistently 
under  the  normality  assumptions  (2.1)  because  the  model  lacks  identi- 
fiability  (Reiersol  [8]). 

Satellite  remote  sensing  does  not  generally  allow  the  type  of 
experimental  control  which  permits  the  collection  of  independent  repli- 
cated observations  on  both  x and  y.  Likewise,  satellite  spectral 
readings  are  usually  converted  to  a single  estimate  of  the  charac- 
teristic of  interest,  thereby  precluding  an  instrumental  variables 
analysis.  In  these  situations  consistent  estimation  of  the  model 
parameters  requires  some  knowledge  of  the  parameters  themselves. 

The  model  parameters  which  are  of  primary  interest  in  the  study  of 

the  structural  model  defined  by  (1.1),  (1.2),  and  (2.1)  are  a,  p, 

and  Since  « 5c  and  fi  * y - px,  the  estimation  of  p 

presents  the  only  serious  problem  to  the  estimation  of  model  (1.1). 

Consequently  one  must  be  able  to  assume  some  knowledge  of  the  error 

variances  an(j  ^2  ±n  order  to  consistently  estimate  the  remaining 

model  parameters,  in  particular  to  estimate  8.  Kendall  and  Stuart 

([4],  Chapter  29)  detail  the  solutions  to  the  likelihood  equations  when 

one  or  both  of  the  error  variances  is  known,  as  well  as  the  solution 

when  the  ratio  of  error  variances  \ « of2/ff2  ±s  known. 
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The  assumption  that  the  ratio  of  error  variances  is  known  is 
perhaps  the  most  f requently-cited  condition  which  is  imposed  to 
solve  the  likelihood  equations.  This  assumption  does  not  require 
explicit  knowledge  of  either  of  the  error  variances  and  one  often 
encounters  analyses  for  which  it  is  reasonable  to  conclude  that 
the  error  variances  are  equal  (i.e.,  X * 1).  Knowledge  of  the 
error  variance  ratio  also  insures  that  the  variance  estimates 
calculated  from  equations  (2.3)  are  nonnegative,  a property  which 
is  not  guaranteed  when  one  or  both  of  the  error  variances  are 
assumed  known. 

The  solution  for  I when  X is  assumed  known  is 

(2.4)  B * s(X)  + sign(  sxy)  ■{  s2< \)  + X}1^2,  s(X)  * ( s2-Xs2)/(2sXy) . 
Due  to  the  nonnegativity  of  the  error  variance  estimators  when  \ is 
known,  the  estimating  equations  (2.3)  provide  the  following  bounds  on 
the  magnitude  of  the  structural  model  slope  estimator: 

(2.5)  J sxy I - iPl  S s2/|sXy|  • 

The  lower  bound  in  inequality  (2.5)  Is  the  least  squares  estimator  of 
the  slope  parameter  for  the  regression  of  y on  x and  the  upper  bound  is 
the  inverse  of  the  least  squares  estimator  for  the  regression  of  x on  y. 
These  two  limits  corresponed  to  structural  model  estimators  when  it  is 
known  that  there  is  no  error  in  the  predictor  or  the  response  variable, 
respectively.  The  latter  estimator  is  also  referred  to  in  the  literature 
on  linear  calibration  as  an  "classical"  least  squares  estimator  (e.g., 
Lwin  and  Maritz  [7]). 

Using  the  method  of  statistical  differentials  (e.g.,  Serfling  [10j, 
Chapter  6)  one  can  reexpress  estimator  (2.4)  in  a Taylor  series  expan- 


sion  about  the  true  parameter  value.  By  truncating  this  series  one  can 
approximate  the  distribution  and  moments  of  it.  Anderson  [1])  cautions 
that  the  asymptotic  properties  so  derived  pertain  to  the  approximation 
to  {S  and  not  to  I itself;  nevertheless,  the  asymptotic  moments  of  the 
Taylor  series  expansion  provide  a potentially  useful  description  of  the 
behavior  of  the  structural  model  estimator. 

Replacing  the  sample  moments  in  (2.4)  by  their  corresponding 
parameter  values  one  readily  establishes  the  consistency  of  g.  Applying 
the  method  of  statistical  differential  to  a first-order  approximation  to 
(2.4),  the  asymptotic  variance  of  this  approximation  to  g is  to  0(n“2) 

( Lakahminarayanan  and  Gunst  [5]): 

(2.6)  n“M(P2+l)r  + I/2}  , 

where  y *»  a2/ff2  i8  the  "noise-to-signal"  ratio  for  the  observable 
u X 

predictor  variable  x.  For  comparative  purposes,  the  asymptotic  mean 
squared  error  of  the  least  squares  estimator  §Lg  * axy/Ss  ^c*^* 
Richardson  and  Wu  [9],  equations  (2.24)  and  (2.25)): 

(2.7)  f}2y(n-l+y)(  l+y)-2  + n“*  X/ ( 1+7  ) “1  . 

When  the  error  variance  ratio  X is  incorrectly  specified,  the  struc- 
tural model  estimator  (2.4)  ia  no  longer  consistent.  Again  ignoring 
termB  to  0(n~2)  f the  asymptotic  expectation  and  variance  of  a first- 
order  approximation  to  (2.4)  using  an  assumed  value  of  X*  for  the  error 
variance  ratio  is  (Lakahminarayanan  and  Gur.st,  [5})s 

(2.8)  E(g)  = g\<X*)  + sign(g){g2(X*>  + X*)1/2 

(2.9)  Var(g)  = n-*(g2+X*r2[3p2y2(X-X*)2+(g2+X*)2{(g2+X)r+Xy2} ] 
where  g\(X*)  * { ( g2-X*)<r2+(X-X*)o2 } / ( 2gtf2 ) . When  X*  « X,  E(g)  =» 
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B and  equation  (2.9)  reduces  to  equation  (2.6). 

While  the  structural  model  estimator  (2.4)  and  approximate  asymp- 
totic properties  such  as  equation  (2.6)  are  routinely  used*  few  theo- 
retical or  empirical  studios  have  been  conducted  to  evaluate  the  per- 
formance of  p when  (i)  the  error  variances  ratio  is  incorrectly  chosen* 
or  (ii)  only  small  samples  of  data  are  available.  Such  evaluations  are 
especially  important  for  the  present  study  since  one  generally  en- 
counters small  numbers  of  sample  locations  for  which  both  ground 
observations  and  satellite  estimates  are  available  and  the  true  variance 
ratio  is  not  known.  The  next  section  presents  a detailed  investigation 
of  these  two  issues. 

3.  Variance  Ratio  and  Sample  Size  Effects 

Lakshminarayanan  and  Gunst  [5]  repeat  the  results  of  an  investi- 
gation on  the  effects  of  two  factors  on  the  performance  of  the  struc- 
tural model  estimator  (2.4):  the  choice  of  the  variance  ratio  \ and 
sample  size*  In  this  section  the  results  reported  by  Lakshminarayanan 
and  Gunst  are  presented  in  greater  detail.  In  particular,  (i)  the 
structural  model  estimator  is  shown  to  be  insensitive  to  the  choice  of 
the  variance  ratio  only  when  \ is  large  and  y is  small,  (ii)  the  struc- 
tural model  estimator  is  shown  to  possess  a smaller  mean  squared  error 
than  least  squares  only  when  the  variance  ratio  is  chosen  in  a rela- 
tively narrow  neighborhood  of  the  true  value,  and  (iii)  satisfactory  use 
of  asymptotic  formulae  for  variance  estimation  requires  a sample  size  in 
excess  of  200. 

Asymptotically  (i*e.,  replacing  the  sample  moments  by  their 
parameter  values). 


(3.1)  ap/3X  - -By/(B2  + X) 

indicating  that  the  rate  of  change  of  0 with  respect  to  \ depends  on 
the  true  values  of  8,  X,  and  y.  Figure  1 is  a graph  of  the  relative 
rate  of  change  |a$/9X|/B,  this  figure  illustrates  the  general  features 
of  equation  (3.1):  B is  relatively  insensitive  to  the  true  value  of  X 
for  large  values  of  X and  small  values  of  y (holding  ft  fixed) . Together 
these  two  conditions  imply  that  0J-,  the  error  variance  for  the  obser- 
vable variable  x,  is  small.  In  other  words  if  0^  is  not  negligible 
the  linear  structural  estimator  can  be  very  sensitive  to  the  true  value 
of  X,  suggesting  that  an  incorrect  choice  of  X could  substantially  alter 
the  estimator. 

Alternatively,  one  might  wish  to  assess  the  sensitivity  of  g to  X 
when  the  variance  ratio  is  assumed  to  be  stochastic  rather  than  con- 
stant. Lindley  and  El-Sayyad  [6]  propose  that  a Uniform(k“^ ,k) 
prior  for  X be  assumed  if  the  measurement  errors  are  believed  to  be  of 

the  same  magnitude.  Other  reasonable  priors  include  N(k,0j[/  ant* 
Chisquare(k)  distributions.  If  one  approximates  the  expectation  of 

(3.1)  using  a three-term  Taylor  series  expansion,  the  approximate 
expectations  for  the  above  three  priors  are,  respectively, 

> 

(3.2)  -2By[(2B2+k+k_1)"1  + (k-k"1)2(3(2p2+k+k"1)3>-1 ] 

(3.3)  -&y{(B2+k)“1  + 02(p2+k>"3} 

(3.4)  -ByUB2^)"1  + 2k(&2+k)“3^  . 

Equations  (3.2)-(3.4)  (divided  by  B)  are  graphed  in  Figures  2-4.  The 
same  overall  conclusions  drawn  from  Figure  1 are  apparent  in  these 
graphs:  the  structural  model  estimator  (2.4)  is  relatively  insensitive 
to  the  true  value  of  the  variance  ratio  only  when  X(k)  is  large  and  y is 
small. 
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The  asymptotic  momenta  (2*8)  and  (2.9)  demonstrate  two  important 
properties  of  the  structural  model  estimator  when  the  variance  ratio  is 
chosen  incorrectly.  When  X is  chosen  incorrectly  equation  (2.8)  shows 
that  p is  biased.  In  addition , 0 has  a larger  variance  (compare 
equations  (2.6)  and  (2.9))  than  the  structural  model  estimator  which  is 
obtained  with  the  correct  value  of  L Thus  not  only  is  the  structural 
model  estimator  sensitive  to  the  choice  of  the  variance  ratio  but  its 
mean  squared  error  properties  are  also  affected  by  both  the  true  value 
of  \ and  by  an  incorrect  choice  of  the  variance  ratio* 

Figure  5 is  a graph  of  the  ratio  of  the  asymptotic  variance, 
equation  (2.6),  of  the  structural  model  estimator  to  the  mean  squared 
error,  equation  (2.7),  of  the  least  squares  estimator  (recall  that  the 
structural  model  estimator  is  asymptotically  unbiased).  In  this  figure 
B,  and  cr^  are  fixed  afc  3,5,  and  10,  respectively,  so  that  by 

varying  both  X and  y are  simultaneously  varied;  in  particular, 
small  values  of  cr 2 correspond  to  large  values  of  X and  small  values  of 
y.  As  the  figure  indicates,  unless  X is  very  small,  corresponding  to  a 
large  error  variance  for  the  predictor  variable  relative  to  that  of  the 
response  variable,  the  structural  model  estimator  has  a smaller  asymp- 
totic mean  squared  error  than  the  least  squares  estimator. 

Figures  6-9  display  ratios  of  the  asymptotic  mean  squared  errors  of 
the  structural  model  estimator  with  an  incorrect  choice  of  the  variance 
ratio  to  that  of  the  least  squares  estimator.  These  figures  use  the 
same  model  parameters  as  does  Figure  5 but  with  er^  selected  so  that 
the  true  variance  ratio  is  l,  6,  and  10,  respectively.  The  figures 
demonstrate  that  the  assumed  variance  ratio  must  be  chosen  in  a re- 
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latively  narrow  interval  around  the  true  value  in  order  for  the  struc- 


tural model  estimator  to  have  a smaller  mean  squared  error  than  least 
squares • 

The  foregoing  theoretical  properties  are  asymptotic  and  do  not 
necesssarily  indicate  properties  of  the  structural  model  estimator  for 
finite  sample  sizes.  It  is  particularly  important  to  assess  the 
behavior  of  B for  finite  sample  sizes  because  the  asymptotic  moments 
which  were  derived  in  the  last  section  pertain  to  a Taylor  series 
approximation  to  B and  not  to  the  true  distribution  of  the  structural 
model  estimator. 

In  Table  1-4  1000  replications  of  samples  of  size  n were  generated 
from  the  structural  model  defined  by  equations  (1.1),  (1.2),  and  (2.1) 
using  normal  variates  from  I.M.S.L.  subroutine  GGNML  on  a C.D.C.  6600 
computer.  Unless  otherwise  specified,  B»  and  are  fixed  at  3, 

U A 

5,  and  5,  respectively  (thus  y * 1).  By  varying  <j2  the  results  are 
only  a function  of  X and  n.  The  values  in  the  tables  are  displayed  as  a 
function  of  the  true  value  of  X and  an  assumed  value,  X*.  Correct 
choices  of  the  variance  ratios  correspond  to  entries  for  which  X*  * X. 

Table  1 displays  ratios  of  the  average  of  1000  g values  calculated 
from  equation  (2.4)  to  the  true  value  of  B»  For  samples  of  size  n * 50 
and  100  the  maximum  relative  error  in  estimating  B using  the  correct 
value  of  the  variance  ratio  is  4%.  Incorrectly  choosing  X*  larger  than 
the  true  variance  ratio  results  in  underestimation  of  g whereas  too 
small  a value  of  X*  results  in  overestimation  of  B. 

Estimated  and  asymptotic  mean  squared  errors  for  the  structural 
model  estimator  are  compared  in  Table  2*  Estimated  mean  squared  errors 
are  calculated  from  the  usual  formula: 

rase  - E(§i-&)2/1000 
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and  asymptotic  mean  squared  errors  are  calculated  from  equation  (2.6) 
using  the  true  parameter  values.  The  ratios  in  Table  2 corresponding  to 
the  correct  choice  of  the  variance  ratio  indicate  that  use  of  asymptotic 
formulae  for  moments  of  structural  model  estimators  cannot  be  recommend- 
ed for  samples  of  size  n - 100  or  less.  Errors  of  15-30%  between  sample 
and  theoretical  mean  squared  errors  occur  for  samples  of  size  100  when 
the  variances  ratio  is  correctly  chosen;  much  larger  errors  occur  when 
the  variance  ratio  is  incorrectly  chosen. 

Tables  3 and  4 display  ratios  of  sample  and  asymptotic  mean  squared 
errors  for  samples  of  size  n = 200  and  several  values  of  3,  y,  X,  and 
X*.  When  X is  correctly  chcsen  the  ratios  are  much  closer  to  1.0  in 
these  tables  than  in  Table  2.  If  relative  errors  of  approximately  10% 
or  less  are  acceptable,  these  tables  indicate  that  samples  of  size  n =* 
200  could  be  considered  minimally  acceptable  for  a wide  range  of  model 
parameters.  These  tables  also  demonstrate  that  X*  must  be  selected  near 
its  true  value  for  the  asymptotic  variance  formula  (2.6)  to  provide  a 
reasonable  assessment  of  the  variability  of  0.  When  3‘  is  small  it  is 
especially  undesirable  to  choose  values  of  X*  which  are  less  than  the 
true  ones.  The  deleterious  effects  of  erroneous  selection  of  the 
variance  ratio  decrease  with  larger  values  of  3 and  smaller  values  of  y; 
moreover,  when  y = 0,1  and  3=  10  the  ratios  in  Table  3 indicate  a 
relative  error  of  15%  or  less  for  most  of  the  cases  in  which  X*  # X. 

4.  Applications 

In  this  section  the  use  of  the  structural  model  estimator  (2.4)  is 
examined  on  two  data  sets  for  which  ground  observations  and  satellite 
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classifier  estimates  are  available  for  the  proportion  of  corn  grown  in 
each  of  several  segments  or  portions  of  segments.  No  attempt  is  made  to 
obtain  a final  prediction  equation  for  these  examples;  rather,  they  are 
used  merely  to  illustrate  several  important  features  of  structural  model 
estimators. 

The  first  example  1b  taken  from  Badhwar,  Carnes*  and  Austin  [3]. 
The  data  set  consists  of  41  segments  for  which  the  proportion  of  corn 
in  the  segments  has  been  determined  from  ground  observation.  The 
satellite  estimates  are  obtained  from  a temporal  model  of  crop  greenness 
(Badhwar  [2]).  Figure  9 is  a scattergram  of  the  ground  truth  pro- 
portions versus  the  Badhwar  estimates  of  the  proportions  taken  from  the 
raw  data  listed  in  Table  5. 

For  illustration  purposes,  assume  that  the  structural  model  defined 
by  equations  (1*1),  (1.2),  and  (2.1)  is  an  adequate  representation  of 
the  relationship  between  the  true  proportions  and  their  satellite 
estimates.  Note  that  the  variance  ratio  \ is  not  known.  Inequality 
(2.5)  provides  bounds  on' the  structural  model  slope  estimate: 

1.001  < S £ 1.291. 

Thus  the  slope  estimate  is  bounded  in  a relatively  narrow  interval. 
Table  6 lists  estimates  for  several  values  of  the  variance  ratio. 
Observe  that  the  greatest  chan;*  in  the  estimates  occurs  for  variance 
ratios  in  the  interval  [0,1}.  If  it  is  believed  that  ground  observation 
is  subject  to  less  error  than  satellite  estimates,  the  variance  ratio 
would  be  expected  to  be  in  this  interval. 

In  a forthcoming  report,  uniform  and  beta  distributions  are  assumed 
for  the  unobservable  predictor  variable  X.  The  use  of  nonnormal 
distributional  assumptions  for  X enables  one  to  obtain  estimates  of  the 
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variance  ratio  without  any  knowledge  of  the  error  variances*  The 
estimates  obtained  for  the  variance  ratio  under  the  uniform  and  beta 
assumptions  are  .501  and  1.146,  respectively.  These  values  of  X 
correspond  to  slope  estimates  of  1.206  and  1*146.  The  similarity  of 
these  estimates  and  those  in  Table  6 suggests  that  knowledge  of  the 
exact  value  of  the  variance  ratio  may  not  be  critical  for  this  data  set. 

The  second  data  set  was  obtained  from  a modelling  of  greenness 
values  using  a mixture  of  Weibull  distributions  as  discussed  in  Woodward 
et  al.  [11].  Random  samples  of  200  pure  pixels  were  obtained  from  a 
single  segment  of  corn  and  soybean  crops,  the  proportion  of  corn  in  each 
of  the  61  samples  was  calculated,  and  minimum  distance  estimates  of  the 
corn  proportions  were  obtained.  Figure  10  exhibits  a scatfcergram 
plotted  from  the  raw  data  in  Table  7*  Note  that  large  amount  of 
variability  in  the  classifier  estimates  (x)  relative  to  the  true  corn 
proportions  (y) . 

The  purpose  in  examining  this  data  set  is  to  illustrate  the 
behavior  of  the  structural  model  estimator  when  the  data  contains  more 
variability  in  the  predictor  variable  than  in  the  response  variable. 
Again  using  inequality  (2.5),  the  structural  model  slope  estimate  is 
bounded  by  the  following  values: 

.100  < B < 1.717. 

This  interval  is  much  wider  than  the  interval  for  the  previous  example 
and  suggests  that  greater  uncertainty  surrounds  the  choice  of  the 
variance  ratio.  Table  8 displays  estimates  for  a range  of  X values  in 
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the  interval  [0,1).  The  dramatic  drop  in  the  estimates  for  very  small 
variance  ratios  makes  choice  of  an  appropriate  structural  model  estimate 
difficult  if  the  variance  ratio  is  believed  to  be  small,  an  assumption 
which  is  supported  by  Figure  10.  If  one  again  calculates  the  estimates 
under  uniform  and  beta  assumptions  on  X,  the  variance  ratio  estimates 
are  .271  and  .091,  respectively,  corresponding  to  structural  model  slope 
estimates  of  .222  and  .909. 


5.  Concluding  Remarks 

Linear  structural  models  acknowledge  the  presence  of  measurement 
error  in  both  the  response  variable  and  the  predictor  variable,  thereby 
allowing  a more  realistic  representation  of  the  relationship  between 
ground  observations  and  satellite  estimates  than  least  squares  esti- 
mation of  the  parameters  of  linear  regression  models.  In  this  paper 
the  potential  far  application  of  linear  structural  models  to  the 
estimation  of  agricultural  and  vegetation  characteristics  is  investi- 
gated. Assuming  normal  probability  distributions  for  the  unobservable 
predictor  variable  and  the  two  measurement  errors  necessitates  some 
knowledge  of  the  error  variances  in  order  to  estimate  the  model  para- 
meters. The  focus  of  this  study  is  on  the  assumption  that  the  ratio  of 
error  variances  is  known. 

The  asymptotic  properties  presented  in  Section  2 demonstrate  that 
when  an  incorrect  variance  ratio  is  used  the  structural  model  estimator 
is  biased  and  has  a larger  variance  than  the  corresponding  estimator 
which  uses  the  correct  variance  ratio.  The  simulations  in  Section  3 
show  that  samples  as  small  as  50  or  so  allow  acceptable  estimation  of 
the  slope  parameter  when  the  variance  ratio  is  known.  Samples  as  large 


as  200  or  more  are  necessary  for  asymptotic  variance  formulae  to  provide 
good  measures  of  the  variability  of  the  estimator*  Likewise,  assumed 
values  of  the  error  variance  ratio  in  a narrow  interval  around  the  true 
value  are  necessary  both  for  accurate  estimation  of  the  slope  parameter 
and  for  acceptable  estimation  of  the  estimator  variability. 

The  structural  model  estimator  was  applied  to  two  data  sets  on  crop 
proportion  estimation  in  Section  4.  In  one  of  the  data  sets  the  precise 
selection  of  the  variance  ratio  was  not  found  to  be  critical  to  the 
obtaining  of  suitable  parameter  estimates  because  the  structural  model 
estimator  changed  relatively  little  over  a wide  range  of  values  of  the 
variance  ratio.  In  the  second  example  choice  of  the  variance  ratio  the 
estimator  more,  leaving  greater  uncertainty  surrounding  the  appropriate 
value  to  use. 

Many  opportunities  exist  for  improving  the  application  of  struc- 
tural model  estimators  to  the  estimation  of  agricultural  and  vegetation 
characteristics.  When  the  three  predictor  variable  X is  normally 
distributed  estimation  of  the  slope  parameter  can  be  accomplished  with 
replication  or  with  the  use  of  instrumental  variables.  The  application 
of  structural  estimation  in  these  two  situations  will  be  detailed  in 
future  reports. 

Theoretically  the  estimation  problems  described  in  this  paper  are 
not  encountered  if  the  unobservable  predictor  variable  is  nonnormally 
distributed.  Estimates  of  the  variance  ratio  for  both  of  the  examples 
in  Section  4 were  obtained  under  assumptions  of  uniform  and  beta 
distributions  on  X.  Nonnurmal  assumptions  for  X present  challenging 
theoretical  and  computational  problems  which  will  also  be  documented  in 
a future  report. 
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Ratio  of  Simulated  and  Asymptotic  Expectations  of  Structural 
Model  Slope  Estimators 

Assumed  X* 


0.2 

1.16 

1.04 

1.0 

1.21 

1.13 

2.0 

1.40 

1.29 

0.0 

2.49 

2.08 

77 

0.62 

81 

0.66 

90 

0.69 

26 

1.10 

74 

0.62 

81 

0.64 

87 

0.67 

62 

1.04 

74 

0.61 

79 

0.64 

87 

0.67 

56 

1.03 

Table  2.  Ratio  of  Simulated  and  Asymptotic  Mean  Squared  Errors  of 
Structural  Model  Slone  Estimators 


Assumed  X* 

True  X 

0.2 

0.5 

1.0 

108.11 

9.91 

35,78 

4553.13 


2.69 

3.32 

10.48 

996.18 


550.84 

80.04 

3087.32 


2 


^3:.  31 
1.99 
3.42 
357.29 


2.48 

1.66 

2.38 

1563.05 


1.27 

2.21 

3.79 

36.15 


1.15 

1.92 

3.08 

30.49 


(b)  n - 50 


(c)  n = 100 
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ABSTRACT 


Calibration  and  inverse  regression  estimators  of  crop  proportions 
are  investigated  where  the  auxiliary  variable  is  obtained  from  binary 
classification  of  multivariate  Landsat  data.  We  argue  that  the  appro- 
priate model  relating  classifier  proportions  and  ground  observed  pro- 
portions for  a given  crop  type  is  the  calibration  model.  We  then  show, 
however,  that  under  this  model  the  inverse  regression  estimator  is 
superior  to  the  calibration  estimator  in  estimating  the  crop  acreage  or 
proportion  for  a region  of  interest. 


1.  INTRODUCTION 


The  Statistical  Reporting  Service  of  the  United  States  Department 
of  Agriculture  (USDA)  applies  probability  sample  survey  methodology  to 
obtain  crop  acreage  estimates.  Each  year,  a survey,  known  as  the  June 
Enumerative  Survey  (JES),  is  conducted  in  the  United  States  to  collect 
land  use  and  crop  acreage  data.  These  data  are  collected  for  randomly 
selected  area  segments.  The  sampling  error  at  the  national  level  is 
believed  to  be  about  2 percent.  At  the  state  and  lower  levels,  the 
sampling  error  is  considerably  larger.  Sigman  et  al.  [81  proposed  using 
a regression  estimation  approach,  based  on  Landsat  data  in  conjunction 
with  the  sample  survey  data,  to  decrease  the  sampling  error  at  these 
lower  levels. 

Basically,  the  approach  is  to  acquire  Landsat  data  over  a stratum, 
called  an  analysis  district,  containing  a number  of  JES  sample  segments. 
The  Landsat  data  are  classified,  using  data  from  the  sample  segments  for 
training,  and  selected  crop  acreage  or  proportion  estimates  are  obtained 
for  each  sample  segment  in  the  stratum  as  well  as  for  the  entire 
stratum.  The  crop  acreages  for  the  sample  segments  observed  in  the  JES 
are  regressed  onto  the  corresponding  estimates  obtained  from  the 
classification  of  the  Landsat  data  and  the  resulting  relationship  is 
used  to  obtain  an  estimate  of  crop  acreage  for  the  stratum  from  the 
classifier  estimate  for  the  stratum. 
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In  general,  this  sample  survey  problem  can  be  stated  as  follows. 
Consider  a population  made  up  of  N clusters.  Assume  that  each  cluster 
contains  a large  number,  M,  of  units.  (In  the  above  context,  a unit  is 
a pixel.)  Let  Cj  represent  the  class  of  interest  and  suppose  a unit 
either  belongs  to  or  its  complement  Cq.  For  cluster  i,  let  be  the 
proportion  of  units  belonging  to  Cj.  Next,  let  Z be  a p x 1 measurement 
vector  observed  for  each  unit  in  the  population. 

Suppose  n clusters  are  randomly  selected  from  the  N clusters  and 
their  units  are  enumerated  and  correctly  identified  with  respect  to  the 
two  classes  Cj  and  Cq.  The  actual  proportion  of  units  in  Cj  is  then 
known  for  each  of  the  n sampled  clusters.  The  set  of  observations  on  Z 
for  the  units  of  the  sampled  clusters  is  used  to  obtain  a discriminant 
function  and  a classification  rule.  Each  unit  in  the  population  is  then 
classified  based  on  their  observations  for  the  measurement  vector  Z. 

Let  denote  the  proportion  of  units  classified  into  Cj  for  cluster  i. 
The  problem  is  to  estimate  the  population  mean 

N 

V =!>,/« 

For  the  n sampled  clusters,  suppose  y^,  y2»  *•*  , yn  are  the  actual 
proportions  of  units  in  C^  and  x-^,  X£,  xn  are  the  corresponding 
estimates  obtained  from  classification  of  units.  Let 

N 

Si  -E  VN 


be  the  average  proportion  obtained  from  classification  of  units  in  the  N 
clusters  making  up  the  population. 
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The  estimator  of  V considered  in  Sigman  et  al.  [8J  is  the  standard 
regression  estimator  (Cochran  [11)  which  is  based  on  the  regression  of 
the  y.j  onto  the  for  the  sampled  clusters.  This  is  an  inverse 
regression  since  the  Y^  are  independent  variables,  that  is,  variables 
that  take  on  values  that  can  be  observed  but  not  controlled,  and  the 
are  dependent  variables,  variables  whose  values  depend  on  changes  in  the 
independent  variables.  The  actual  relationship  between  the  and  the 
depends  upon  the  overlap  between  the  class  distributions  of  the  mea- 
surement vector  Z for  and  Cq,  and  the  classification  procedure. 

Since  the  are  dependent  on  the  independent  variables  Y^,  another 
estimator  of  Y can  be  obtained  using  calibration,  which  is  based  on  the 
direct  regression  of  the  on  the  y^  for  the  sampled  clusters.  In 
1967,  Krutchkoff  [5]  advocated  the  use  of  inverse  regression  based  on 
the  results  of  extensive  simulation  studies.  Since  then,  the  contro- 
versy over  the  properties  and  hence  the  utility  of  these  two  estimators 
have  been  extensively  discussed  in  the  literature  (Lwin  and  Maritz  (6j). 
However,  the  studies  in  the  literature  assume  a random  sample  (x^,  y^), 
i = 1,  •**,  n,  from  an  infinite  population  with  the  goal  of  estimating 
an  individual  y^  for  a given  value  x^.  Moreover,  none  of  the  previous 
studies  address  the  problem  of  classification.  Hence,  the  present  study 
is  novel. 


: q r?« 
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As  in  previous  studies,  assume  that  the  following  linear  model 
holds  true: 

(1.1).  X.j  = y + 6Y.j  + n.* 

where  y and  6 are  unknown  parameters  and  the  n . are  random  errors  (inde- 
pendent of  the  Y^)  having  mean  zero  and  finite  variance.  If  the  are 
normally  distributed,  Shukla  (7]  has  shown  that  the  calibration  esti- 
mator of  Y^  for  a given  is  constant  but  has  infinite  variance,  where- 
as the  inverse  regression  estimator  is  biased  but  has  finite  variance. 
Without  assuming  normal  distribution  for  the  n^,  Lwin  and  Maritz  [63 
have  shown  that  the  inverse  regression  estimator  has  lower  mean  squared 
error  than  the  calibration  estimator  in  estimating  Y^  for  a given  X^,  if 
the  Y.j  lies  in  the  range  of  the  sample  yj,  ••*,  yn.  This  is  very  likely 
to  be  the  case  in  the  present  study  where  an  estimate  of  7 is  desired, 
since  y^,  •••,  yn  are  a random  sample  from  the  finite  population  whose 
mean  is  7. 

The  calibration  and  inverse  regression  estimators  of  7 are 
described  in  section  2.  In  section  3,  we  discuss  the  classification  of 
units  based  on  the  measurement  vector  Z and  investigate  the  two  models 
relating  the  Y^  and  the  X^.  It  is  shown  that  the  calibration  model 
given  by  (1.1)  is  linear  when  the  X^  are  obtained  using  the  maximum 
likelihood  classification  rule,  but  not  the  reverse  model.  A simulation 
study  was  conducted  to  compare  the  calibration  and  inverse  regression 


estimators  of  7.  A description  of  the  simulation  study  and  the  results 
are  presented  in  section  4.  The  results  are  summarized  in  section  5. 
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2.  THE  ESTIMATORS 


For  measurement  units,  define  the  random  variable 


(2.1) 


!1,  U e C, 
1 

0 , u c Cq 


where  Z is  the  observation  vector  of  unit  u.  Suppose  units  in  the  pop- 

i i 

ulation  are  stratified  using  a classification  rule  and  and  Cq  are  the 
strata  corresponding  to  classes  Cj  and  Cq,  respectively.  Define  another 
random  variable 

(l,  u c q 

(2.2)  *(Z)  = { 

™ ' 1 ft  M I 


The  pair  of  random  variables  (n(Z),  ip{Z) ) characterizes  the  two-way 
classification  of  units  - actual  vs.  classified.  If  n(Z)=»l»(Z)  for  all 
units,  the  classification  is  perfect;  otherwise,  it  is  fallible. 


Assume  that  the  cluster  size  is  large  and  for  a cluster,  Y=E(n{Z)j 
and  X=E[i1j(Z)].  Considering  the  relative  frequencies  of  units  in  C^  and 

I 

C^  approximated  by  the  expected  values  of  the  random  variables  n(Z)  and 
i|i(Z),  one  can  write  Y and  X as  probabilities. 


“£1 


212 


Y = P[n(Z)  = 1] 
X = PU(Z)  = 1). 


(2.3) 


Similarly,  the  two  classification  error  rates  are  approximated  by  the 
conditional  probabilities. 


e0  = P[*(Z)=0  | n(Z)-l I 
ex  = P[*(Z)-1  | n(Z)=0] 


where  eQ  is  called  the  omission  error  and  is  called  the  commission 
error.  From  (2.3)  and  (2.4),  it  follows  that 


(2.5) 


X = Ql  + (I-Sq-S^Y. 


rT 

L 

1 

1 


1 

I 

■1 


i 

Ji 


f-p 


di 


Thus,  for  a cluster,  X is  a linear  function  of  Y,  Upon  considering  the  d 
variability  in  and  eQ  across  clusters,  suppose  the  regression  func-  ^ 
tion  of  X onto  Y is  of  linear  form,  say  i _J 


(2.6) 


£ C X ( Y=y ] = y + 6y. 


Suppose  n clusters  are  randomly  selected  and  the  (x^,  y^), 

i=l,2,***,n,  are  the  pairs  of  observations  for  the  random  variables  X 

and  Y.  Then  by  regressing  x^  on  y^s  a regression  estimator  of  the 

population  mean  7 =]l]y*/N  is  given  by 

1 1 


‘.~U 


T 


P'-n 
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(2.7)  Vc  = y + (5?  - x)/6 

where 

N 

x =£x1/n, 

6 =E(xr^)(yr^)/S  (y^-y)2 

i 1 1 i 1 

n n 

x =]C  xi/n9  y =^)yi/n  . 


In  (2.7),  X is  assumed  to  be  known  and  the  subscript  C stands  for 
calibration.  This  estimator  will  be  called  the  calibration  estimator. 


On  the  other  hand,  Y can  be  written  in  terms  of  X for  a cluster  by 
inverting  (2.5)  as  follows: 

(2.8)  Y = -yU-Vei*  * X/(l-e0-e1) 

= <i>0  + 4^  X 

where 


*0  = -a1/(l-9Q-o1) 

4»1  = l/fl-OQ-Op. 


(2.9) 


Again  *g  and  would  vary  across  clusters  and  thus  one  may  consider  the 
inverse  regression  function  for  Y on  X given  by 

(2.10)  E[Y|X=x]  = a + ex. 


Then  by  regressing  y^  on  x.j,  a regression  estimator  of  7 is  given  by 

(2.11)  YIR  = y + e(X-x) 

where 

* JTC-.  ^ p 

s =2Z(xrx)(y,-y)/  (xrx)  . 

1 i i i 1 

The  estimator  in  (2.11)  will  be  called  the  inverse  regression  estimator. 

In  the  next  section,  we  discuss  the  cluster  proportion  X resulting 
from  the  classification  of  units  based  on  their  observed  data  on  Z and 
investigate  the  two  regression  models  corresponding  to  (2.6)  and  (2.10). 


3.  REGRESSION  MODELS 


3.1  Determination  of  X 


Suppose  that  Z is  a p x 1 random  vector  distributed  normally  with 
mean  vectors  and  yQ  for  classes  and  Cg,  respectively,  and  common 
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covariance  matrix  e.  By  a set  of  linear  transformations,  the  class 
structures  can  be  expressed  in  the  canonical  form: 


(3.1) 

CN(-((A/2)e,  I) 
Z 

where 

“ (N((A/2)e,  I) 

a2  = Chi  ” 1 E_1(yrHo) 

e = (1,  0,  • ■ * , 0) 1 . 


When  a is  known,  the  discriminant  function  based  on  the  log  likelihood 
ratio  is  linear  and  the  maximum  likelihood  classification  rule  is  to 
classify  a measurement  unit  in  if  ly  < 0,  and  in  Cq  otherwise,  where 
is  the  first  component  of  measurement  vector  Z.  In  terms  of  random 
variable  i|>  defined  in  (2.2),  we  have 

(1,  if  Zl  < 0 

(3.2)  *(Z)=< 

00,  otherwise. 

Then  the  omission  and  commission  error  rates  for  are  each  equal 
to  $(-a/2),  where  $ denotes  the  cdf  for  the  standard  normal  distribu- 
tion. 


In  the  present  context,  one  needs  to  evaluate  the  classification 
error  rates  for  individual  clusters.  Because  the  cluster  size  is 
assumed  large,  the  class  distributions  for  each  cluster  can  be  approxi- 
mated by  the  normal.  For  cluster  i,  let  t^e  and  (5.  + A.)e  be  the  mean 
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vectors  of  and  Cq,  respectively,  and  I is  the  common  covariance 
matrix.  This  distributional  assumption  allows  variation  in  class  dis- 
tributions across  clusters  in  the  population.  To  have  the  population 
means  as  assumed  in  (3.1),  we  assume  that  the  average  value  of  across 
clusters  is  -a/2  and  that  of  (&.  + a.)  is  a/2.  The  actual  classifica- 
tion error  rates  8g.  and  e^.  and  the  classified  proportion  for  clus- 
ter i,  denoted  by  B^,  are  easily  obtained  as  follows: 

o0i  = P[Zj>  0|t,(Z)=1, 

- o(-a/2  + 5^ 

(3.3)  0Xi  = P[ZX  < 0| n(Z)=0,  B.] 

= ®(a/2  - - a.) 

and 

X1  = $(a/2  - 5.  - a.)  + l«(a/2  - - ci  - a^jY. 

The  sample  analogue  of  the  linear  discriminant  function  is  obtained 
by  replacing  the  parameters  by  their  estimators  and  is  given  by 

x(Z)  = iz  - (1/2HZ!  + Z0)rs'1(Ii-i0) 

where  1-y  and  ZQ  are  the  sample  mean  vectors  for  and  Cq,  respectively, 
and  S is  the  common  sample  covariance  matrix  obtained  from  the  training 
samples  Z^-  and  Zgj  of  and  Cq,  respectively.  Suppose  a set  of 
clusters  are  randomly  selected  and  the  training  samples  consist  of  all 
the  units  in  the  sampled  clusters.  Because  the  training  sample  size  is 
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large,  the  statistics  Z^,  ZQ  and  S are  approximately  equal  to  the  class 
parameters,  therefore  the  classifier  and  the  error  rates  discussed  above 
will  approximately  hold  true. 

In  practice,  the  parameters  5..,  A-,  and  are  unknown  except  for 

the  sampled  clusters,  and  hence  the  error  rates  and  Xn-  cannot  be 

obtained  for  all  clusters  using  (3.3).  However,  once  the  classification 

of  data  is  completed  using  a classifier,  X^  can  be  computed  directly  as 

» 

the  proportion  of  units  from  that  belong  to  Cj.  Accordingly,  the 
values  of  the  two  estimators  of  7 described  in  section  2 can  be 
computed. 


3.2  CALIBRATION  MODEL 


When  a is  known,  one  can  write  from  (2.5)  for  cluster  i that 


(3.4) 


X,  = V,  + i,v, 


where 


“ Gii  and  6i  “ • 

I 

with  0^  and  Gg-  given  in  (3.3). 

Thus,  the  regression  function  in  (2.6)  can  be  associated  with  the 


*r  Sjt 


model 
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(3.5)  Xi  = Y + 6Y.  + e. 
where 

(3.6)  ei  = (y1  - r)  + («i  - 6)Yi  . 

Since  the  classification  rule  (3.2)  is  independent  of  the  Y.,*  for 
known  a,  and  are  independent  of  Y^.  Let 

(3.7)  y = E[Yi1 

6 ~ 6 [ 6 • ] . 

Then 


Ele-lY.]  = 0 

so  that  the  regression  function  in  (2.6)  holds  and  the  model  given  in 
(3.5)  is  linear. 


In  the  case  of  a finite  population  of  N clusters,  one  may  consider 
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Now,  let  us  consider  the  more  likely  case  for  which  parameters  are 
unknown.  These  parameters  are  estimated  from  the  training  samples 
resulting  in  the  classification  rule  given  by  (3.2)  with  0 replaced  by 
the  estimated  boundary  value.  Then,  corresponding  to  (3.3),  we  have 


A 


*11  + f1  - *01  ' ali>Y1 


1 


fi 


where  and  o0-  denote  the  estimates  of  and  ©q.,  respectively. 
Further,  let 


d0Oi 


Q0i  ■ G0i 


d0li 


0li  “ °li 


dX. 


Xi  - Xi 


= de11  - (doQi  + doli)Yi 


Again,  the  deviations  dOQ^,  do-^  and  dX^  are  independent  of  Y^. 


n 


ii 


Since  + dX^,  it  follows  from  (3.5)  that  we  now  have  the  model 


(3.8) 


X.j  = y + fiY^  + 


where 


(3.9)  n1  « ei  + dXi 

with  e^  given  by  (3.6). 
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The  distributions  of  the  deviations  do0i  and  do^  are  quite 
complicated;  however,  asymptotically,  that  is  as  the  training  sample 
size  becomes  large,  their  means  go  to  zero  (Efron  (21).  Thus,  as  the 
training  sample  si2e  gets  large,  E(dX^ |Y^ ) goes  to  zero.  Hence,  in  the 
case  of  large  training  samples. 


Eln^V,.)  = E(e. |Y.)  ♦ £(dXi |V^) 


goes  to  zero  and  the  model  (3.8)  Is  linear. 


2 2 

Suppose  o^,  and  are  the  variances  and  covariance  respect- 


ively of  y j and  6..  across  clusters.  Then  the  conditional  error 


variance,  V<e^  1 ) = o?,  is  given  by 


(3.10) 


2 2 , w2  2 , nu 

o.  =0  +T.O.+ZY.O. 

1 Y 16  1 yi 


The  conditional  variance  due  to  the  training  sample,  V(dX^ ] ) = o^,  is 
of  order  1/m,  where  m is  the  number  of  units  in  the  training  sample. 
However,  this  variance  is  negligible  when  the  training  sample  size  is 
large.  Of  course,  the  total  conditional  error  variance  for  model  (3.8), 
given  by 


V1  = V(ni (Yi) 


2X2 
= o.  + od. 
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is  non-constant. 
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3.3  INVERSE  REGRESSION  MODEL 


One  can  express  (3.4)  Inversely  as 


Yi 


1 


vi  = -*r*7xi 


Let  a.j  = and  g.  = 1/6^  so  that 


vi  “ “i  + eiV 


Thus,  the  regression  function  in  (2.10)  can  be  conceptualized  in  terms 
of  the  model 


(3.10) 


Yi  = a + |3X,j+e. 


where 


(3.11) 


e..  = (a,.  - a)  + (S^  - S)Xr 


In  the  case  of  training  a classifier,  the  model  becomes 


(3.12) 


Y,.  = a + 0Xi  + 5- 


where 


C*  = - 8 dX^ 


(3.13) 
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Models  (3.10)  and  (3.12)  are  not  linear.  This  can  be  easily  seen  by 
showing  that  the  covariance  of  and  e.  in  (3.10)  is  not  necessarily 
zero.  Hence,  the  regression  function  in  (2.10)  cannot  be  obtained  from 
models  (3.10)  and  (3.12).  One  would  therefore  wonder  about  the  basis  of 
the  inverse  regression  estimator  of  Y given  in  (2.11)  in  section  2. 

In  general,  Lwin  and  Maritz  [6]  showed  that  the  inverse  regression 
estimator  is  a compound  estimator  of  the  sample  mean  and  the  calibration 
estimator  obtained  assuming  that  the  model  (3.8)  holds.  They  assume  a 
constant  error  variance  whereas,  in  the  present  situation,  the  error 
variance  is  non-constant.  Their  formulation  does  not  require  the  in- 
verse regression  model  as  in  (3.10)  or  (3.12)  to  be  linear;  hence  the 
inverse  regression  estimator  of  Y put  forth  in  section  2.0  can  be  justi- 
fied without  reference  to  this  assumption  as  was  done  by  Lwin  and 
Maritz. 

Presently,  no  further  analytical  investigation  of  the  models  and 
the  estimators  is  undertaken.  Instead,  a simulation  study  was  conducted 
to  evaluate  the  models  and  compare  the  two  estimators.  This  simulation 
is  described  in  the  next  section. 


4.  SIMULATION  STUDY 


To  investigate  the  linearity  of  the  calibration  and  inverse  regres- 
sion models  given  by  (3.8)  and  (3.12),  respectively,  and  to  compare  the 
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performance  of  the  two  estimators  of  the  population  mean,  the  following 
simulation  study  was  conducted;  A hypothetical  population  consisting  of 
N clusters  was  considered.  The  number  of  units  per  cluster  is  assumed 
infinite.  For  a given  population  mean,  Y,  the  beta  distribution  (IMSL 
subroutine  GGBTR)  was  used  to  generate  the  actual  proportions,  Y^,  of 
the  class  of  interest  for  each  of  the  N clusters.  For  each  cluster,  the 
distribution  of  the  auxiliary  measurement  variable  Z j for  the  class  of 
interest  was  assumed  normal  with  mean  - a./2  and  variance  is  1. 

For  the  other  class,  the  distribution  was  assumed  normal  with  mean 
5.  + a..  = y.  + a./2  and  a variance  of  1.  The  normal  distribution  (IMSL 

O 

subroutine  GGNML)  with  mean  0 and  variance  a was  used  to  generate  the 
y*  and  the  triangular  distribution  (IMSL  subroutine  GGTRA)  over  the  in- 
terval (A-p,  A+p),  which  has  mean  a and  range  2p,  was  used  to  generate 
0 

the  a.  with  a and  p specified.  To  ensure  a.>0,  it  was  assumed  that 

A>p. 

The  indices  of  each  of  the  variables  for  the  N clusters  were 
randomly  permutated  (IMSL  subroutine  GGPER)  and  the  first  n indices  were 
selected  as  the  sample  for  which  the  actual  proportions  were  assumed 
known. 

The  discriminant  boundary  parameter,  say  x,  was  then  estimated  by 


2 VvV2'  5 u-w  ‘i/2> 


n 


E <1-T,) 
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and  the  classification  rule  for  all  measurement  units  u in  the  popula- 
tion was  taken  as 


t 


t 


(1,  if  Zj<  t/2 

(4.2)  *(Z)  =< 

(0,  otherwise 

for  the  measurement  vector  Z.  Hence,  a unit  is  classified  in  if 

A 

Zj<  t/2  and  into  Cq  otherwise.  Mote  that  we  did  not  actually  generate 
the  measurement  vector  Z.  These  results  correspond  to  having  an  Infi- 
nite number  of  measurement  units  for  training,  since  we  assumed  the 
number  of  units  per  cluster  to  be  infinite.  The  actual  errors  of  mis- 
classification  were  computed  for  each  of  the  N clusters  from  (3.3)  as 

A A A 

Oq.  = Prob  l Z^  > t/2  | UeC|,  B.,  t] 

A 

t+4,* 

= <5>(-  -g — + ) 

and 

A A A 

= Prob  [Zx  ^ t/2  | ueCq9  B,. 9 t] 


Likewise,  using  the  relationships  in  sections  3.2  and  3.3,  X^, 

* A A 

a.,  oi  and  b-  were  computed  for  each  of  the  N clusters. 
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This  process  was  replicated  500  times  for  each  combination  of 
parameters  considered  in  order  to  compute  the  bias,  variance,  and  mean 
squared  error  for  the  calibration  and  inverse  regression  estimators  of 
7.  One  hundred  replications  were  made  to  compute  the  model  errors, 
n.  and  5,.,  and  their  means  and  variances.  Table  4-1  shows  the  values  of 
the  parameters  used  in  the  simulation. 


Figure  4-1 (a)  shows  a histogram  of  the  500  actual  proportions 
generated  from  a beta  distribution  with  mean  .25.  The  actual  mean  and 
variance  of  these  500  proportions  are  .2575  and  .02366,  respectively. 
Figure  4-l{b)  shows  a corresponding  histogram  of  one  realization  of  the 
classified  proportions  resulting  from  using  n=10,  o-.l,  4=1.5,  and  p=l. 
A scatterplot  of  the  actual  versus  the  classified  proportions  for  this 
realization  is  given  in  figure  4-l(c).  In  this  case,  the  relationship 


TABLE  4-1.-  PARAMETER  INPUT  VALUES 
N = 500 
n = 4,  10,  30 
7 = .05,  .1,  .25,  .5 
a = .01,  .10,  .50 
4 w 1.5,  3.0 
P = 0,  .5,  1 
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is  approximately  linear  and  a linear  regression  model  should  hold 
reasonably  well. 

A plot  of  the  model  errors,  n-,  for  the  calibration  model  (3.8), 
versus  the  actual  proportions,  Y^,  for  all  500  clusters  for  one  realiza- 
tion generated  using  Y=.25,  n=10,  o=.lt  a=1.5  and  p=l  is  given  in 
figure  4-2.  Note  that  no  obvious  relationship  exists  between  and  Y. 
supporting  the  linear  model  requirement  that  E[ ri . j Y . ] = 0.  Also  note 
that  the  variance  of  the  errors  tends  to  decrease  with  increasing  values 
of  Y.jS  that  is,  we  have  a non-constant  conditional  error  variance. 

Figure  4-3  contains  a plot  of  the  inverse  regression  model  errors,  c., 

A 

versus  the  classifier  proportions,  X^,  for  the  same  parameter  values. 

A 

Note  the  linear  dependence  indicating  EU,.|X.]  is  non-zero  and  the 
inverse  regression  model  is  not  linear.  The  variance  of  these  errors  is 
also  non-constant,  tending  to  increase  as  the  classifier  proportion 
increases. 

Table  4-2  contains  the  means  and  average  variances  of  the  calibra- 
tion and  inverse  regression  model  errors  computed  from  100  replications 
based  on  Y=.25,  a=.l,  A=1.5,  p=l  and  three  sample  sizes,  n=4,  10  and  30. 
The  mean  error  is  zero  for  the  calibration  model  and  it  is  non-zero  for 
the  inverse  regression  model  as  expected.  Moreover,  the  error  variance 
is  much  larger  in  the  case  of  the  later  model.  Notice  that  the  means 
and  average  variances  are  about  the  same  for  all  three  sample  sizes. 

This  is  due  to  the  fact  that  the  number  of  units  per  cluster  is  infinite 


l-HJt  ..  .1  .1  - P.  - . . 


227 


for  this  simulation  study.  In  the  USDA  problem  which  motivated  this 
study,  the  number  of  units  per  cluster  is  large;  there  are  600  or  more 
pixels  per  sample  segment  used  for  training.  Results  similar  to  those 
in  figures  4-2  and  4-3  and  table  4-2  were  obtained  for  the  other 
combinations  of  parameters  presented  in  table  4-1.  In  summary,  the 
calibration  model  given  by  (3.8)  was  found  to  be  the  appropriate  linear 
model  relating  the  classifier  and  the  actual  proportions  in  this 
simulation. 

The  coefficients  y»  $ in  model  (3.8)  and  a,  b in  model  (3.12)  were 
computed  directly  as  averages  of  and  a.,  b^,  respectively,  for 

the  500  segments  for  each  replication.  Then  their  averages  were 
obtained  from  the  500  replications.  Also,  the  corresponding  least 
square  fits  were  obtained  in  each  case  and  these  coefficients  were 


TABLE  4-2.-  MODEL  ERROR  STATISTICS  BASED  ON  100  REPLICATIONS* 


Average  for  100  Replications 

Error 

statistic 

n 

Model 

4 

10 

30 

Calibration 

Mean 

0 

0 

0 

Variance 

22.5  X 10'4 

22.6  X 10'4 

22.6  X KT4 

Inverse 

regression 

Mean 

.0184 

.0185 

.0185 

Variance 

93.7  X 10~4 

94.0  X 10~4 

93.9  X 10"4 

*Y  = .25,  o = 

.1,  A = 1.5, 

and  p = 1. 

■ in  ■ 
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estimated  for  each  of  the  500  replications.  The  results  for  the  average 
computed  (actual)  and  estimated  values  are  given  in  table  4-3. 

The  results  in  table  4-3  show  that  the  coefficients  y»  5 in  the 
calibration  model  (3.8)  are  unbiasedly  estimated  by  the  least  square 

estimates  obtained  from  regressing  X.  on  Y^,  whereas,  the  coefficients 
a,  6 in  the  inverse  regression  model  (3.12)  are  biasedly  estimated  when 

regressing  on  X..  This  again  shows  that  the  model  (3-8)  is  linear, 
but  model  (3.12)  is  not. 

The  summary  results  for  the  calibration  and  inverse  regression 
estimators  of  the  population  mean  Y for  500  replications  for  the 
parametric  case,  i.e.,  Y = .25,  n = 10,  a = .10,  a = 1.5  and  p = 1,  are 
presented  in  table  4-4.  The  two  estimators  were  truncated  at  0 and  1 
for  estimates  outside  this  range  before  computing  the  summary  statis- 
tics. Truncation  was  actually  needed  only  for  the  calibration 

TABLE  4-3.-  ACTUAL  AND  ESTIMATED  VALUES  OF  REGRESSION  COEFFICIENTS 


Model 

Coefficient 

Actual 

Estimate 

Calibration 

Y 

.237 

.235 

6 

.528 

.534 

Inverse 

a 

-.515 

-.261 

Regression 

6 

2.024 

1.383 

*Averaged  from  500  replications. 


‘r  . iff. 


estimator.  The  HSE  ratio  in  table  4-4  is  the  ratio  of  the  mean  squared 
error  for  an  estimator  to  the  mean  squared  error  using  the  sample  mean 
of  the  n sampled  actual  proportions.  This,  of  course,  is  an  estimate  of 
the  relative  efficiency  of  the  sample  mean  relative  to  the  estimator. 

The  bias  is  negligible  for  each  estimator,  though  it  is  statisfied 
significant  in  the  case  of  the  inverse  regression  estimator.  The  HSE 
ratio  of  .342  for  the  inverse  regression  estimator  is  much  smaller  than 
the  calibration  estimator,  which  performs  rather  poorly  as  its  MSE  ratio 
is  greater  than  1. 

The  summary  results  for  various  combinations  of  parameter  values 
showed  that  the  MSE  ratio  for  the  inverse  regression  estimator  was 
always  less  than  or  equal  to  that  of  the  calibration  estimator.  The 
class  separability  and  sample  size  influenced  the  performance  of  the 
estimators  the  most.  The  calibration  estimator  performed  very  poorly 
for  n=4.  The  inverse  regression  estimator  tended  to  have  significant, 
yet  negligible,  bias  for  n=10  unless  o=.5  in  which  case  it  was  not 


TABLE  4-4.-  SUMMARY  STATISTICS  FOR  500  REPLICATIONS* 


Estimator 

Bias 

Variance 

MSE 

MSE  Ratio 

t-Statistic 

Sample  Mean 

.0020 

.0021 

.0021 

1.000 

.97 

Inverse  Regression 

-.0033 

.0007 

.0007 

.342 

-2.77 

Calibration 

.0002 

.0023 

.0023 

1.088 

.12 

*Y  = .25,  n * 10,  o 

= .1,  a 

= 1.5,  and 

p = : 

1.0 
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necessarily  negligible.  Note  that  an  increase  in  a value  leads  to  an 
increase  in  variability  of  the  means  for  the  class  of  interest  across 
clusters  and,  hence,  an  increase  in  the  variability  of  the  classified 
proportions. 
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The  relationship  between  the  actual  and  classified  proportions  is 
greatly  influenced  by  the  class  separability  parameters  a , p,  and  a. 
When  the  mean  separability  a is  large  and  the  variability  in  class 
separability  and  their  mean  locations  across  clusters  (that  is,  p and  o) 
is  small,  the  relationship  is  fairly  linear,  resulting  in  a high  cor- 
relation coefficient  between  the  actual  and  classified  proportions  for 
the  clusters  in  the  population.  The  population  correlation  coefficient 
decreases  as  a decreases  and/or  either  of  the  other  two  parameters  p and 
o increases. 

In  each  simulation  run,  the  square  of  the  correlation  coefficient 
was  computed  for  both  the  population  and  the  sample.  Variation  in  the 
population  correlation  coefficient  arises  due  to  the  decision  rule  for 
the  maximum  likelihood  classifier  varying  from  sample  to  sample  for  the 
500  replications.  Three  scatterplots  of  the  sample  R2  versus  the  popu- 
lation R2  corresponding  to  three  sample  sizes  are  presented  in  figure 
4-4  showing  the  values  obtained  for  the  combinations  of  parameter  values 
in  table  4-2.  The  scatterplot  has  a fairly  high  population  R2  showing 
very  small  variation  regardless  of  sample  size;  whereas,  the  sample  R2 
is  highly  variable  with  its  variability  decreasing  as  the  sample  size 
increases.  These  scatterplots  show  the  potential  hazard  in  using  the 
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sample  R2  as  an  indicator  of  the  actual  linear  relationship  for  the 
population. 

Figures  4-5,  4-6,  4-7,  and  4-8  summarize  the  MSE  ratios  for  the  two 
estimators  for  all  combinations  of  parameter  values  given  in  table  4-1. 
Figures  4-5  and  4-6  contain  scatterplots  of  the  MSE  ratio  versus  the 
mean  population  R2  for  7=.5  and  a values  of  3 and  1.5,  respectively. 
Figures  4-7  and  4-8  show  the  MSE  ratio  for  Y=.25.  The  symbol  in 
each  scatterplot  represents  the  inverse  regression  estimator  and  the 
symbol  '+'  is  for  the  calibration  estimator.  MSE  ratios  for  the 
calibration  estimator  were  truncated  at  1.6  to  provide  uniform  plots. 

Figures  4-5  and  4-7  indicate  that  when  the  separability  is  large 
across  clusters  and  the  sample  size  is  large,  the  two  estimators  have 
similar  MSE  ratios,  which  significantly  improve  the  efficiency  relative 
to  the  sample  mean.  When  the  sample  size  is  reduced,  both  estimators 
are  affected.  The  calibration  estimator  is  significantly  degraded 
yielding  MSE  ratios  much  larger  than  1 for  n=4.  The  inverse  regression 
estimators  performs  well  overall  with  a modest  decrease  in  efficiency 
when  n=4. 

When  the  separability  between  classes  is  reduced,  as  is  the  case  in 
figures  4-6  and  4-8,  there  is  a tremendous  effect  on  the  mean  population 
R2,  which  is  reflected  in  the  performance  of  the  two  estimators.  For 
a=3,  R2  ranged  from  .75  to  1.0.  Reducing  a to  1.5,  it  ranged  from  .23 
to  1.0.  Note  that  the  inverse  regression  estimator  is  still  superior. 


except  when  n=4,  in  which  case  both  estimators  perform  worse  than  the 
sample  mean  corresponding  to  low  values  of  R2. 


5.  CONCLUDING  REMARKS 


In  the  present  study  the  individual  random  values  for  Z for  the 
within  cluster  measurement  units  were  not  simulated  and  the  current 
results  were  obtained  by  generating  the  parametric  values  directly  for 
the  clusters.  As  mentioned  previously,  this  is  equivalent  to  having  an 
Infinite  number  of  units  per  cluster.  However,  the  introduction  of  the 
range  in  class  separability  (i.e.,  parameter  p)  and  the  variability  in 
the  two  class  means  and  1-1 ,2,  N,  across  clusters  allowed 

a realistic  simulation  of  the  data  structure  likely  to  arise  at  the 
cluster  level  in  the  context  of  the  problem  of  crop  acreage  estimation 
described  in  section  1.  Further  study  is  underway  to  generate  the 
random  values  for  Z and  to  investigate  the  inverse  regression  and  other 
estimators  of  crop  acreages  using  results  from  classification  of  the 
individual  units  within  clusters. 

Scatterplots  of  model  errors  versus  actual  proportions  for  the 
calibration  model  and  model  errors  versus  classifier  proportions  for  the 
inverse  regression  model  were  made  for  all  500  clusters  for  various  par- 
ametric cases.  These  plots  indicated  that  the  assumption  of  linearity 
for  the  calibration  model  holds  true;  however,  the  model  error  has  a 
non-constant  variance  when  the  separability  of  the  two  class  distribu- 


tions  for  the  population  vary  highly  across  clusters.  The  inverse 
regression  model  is  not  linear. 
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Comparisons  of  the  two  estimators  of  the  population  mean  ¥ indi- 
cated the  inverse  regression  estimator  to  be  better  than  the  calibration 
estimator.  The  calibration  estimator  is  unreliable  unless  the  popula- 
tion between  the  actual  and  classifier  proportions  is  high  (.8  or 
larger).  The  inverse  regression  is  significantly  biased  in  some  cases; 
however,  in  each  of  these  cases  the  bias  was  negligible.  The  bias  was 
statistically  significant  due  to  the  variance  of  the  inverse  regression 
estimator  being  quite  small. 
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Figure  4-2.-  Scatterplot  of  calibration  model  errors  versus 
actual  proportions  with  Y = .25,  n = 10,  o =.l, 
A = 1.5  and  p=l. 
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CLASSIFIER  PROPORTIONS  (Xj) 


Figure  4-3.-  Scatterplot  of  inverse  regression  model  errors 
versus  classifier  proportions  with  Y = .25, 
n = 10,  a = .1,  a = 1.5  and  p = 1. 
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Figure  4-4.-  Sample  R2  versus  population  F 
a ~ .1,  a = 1.5,  and  p = 1.0 
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Figure  4-5.-  Scatterplots  of  the  MSE  ratio  versus  the  me< 
population  Rz  for  the  inverse  regression 
estimator  (*)  and  the  calibration  estimator 
for  7 = .5  and  a = 3. 
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Figure  4-6.-  Scatterplots  of  the  MSE  ratio  versus  the  mean 
population  Rz  for  the  inverse  regression 
estimator  (*)  and  the  calibration  estimator  (+) 
for  Y = .5  and  a = 1.5. 
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ABSTRACT 


This  paper  describes  the  evidence  accumulation  process  of  an  image 
understanding  system  first  described  in  [1] , which  enables  the  system  to 
perform  top-down  (goal-oriented)  picture  processing  as  well  as  bottom-up 
verification  of  consistent  spatial  relations  among  objects. 
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l.  Introduction 

In  a previous  report  [1] , we  described  the  organization  of  an 
aerial  image  analysis  system.  There  are  three  levels  of  representa- 
tion and  control  in  that  system:  A High  Level  Expert (HLE)  that  util- 
izes a symbolic  hierarchical  model  for  the  possible  spatial  organiza- 
tion of  objects  in  the  image  to  build  partial,  local  interpretations 
of  the  image  and  to  determine  where  to  further  analyze  the  image  and 
what  analyses  to  perform;  a Model  Selection  Expert  (MSE)  that  deter- 
mines, on  the  basis  of  contextual  information  provided  by  the  HLE, 

the  most  promising  appearance  descriptions  to  use  in  searching  for 
objects  and  structures  in  the  image;  and  a Lew  Level  Vision 
Expert (LLVE)  that  finds  pictorial  entities  that  satisfy  these  appear- 
ance descriptions  by  selecting  image  processing  methods  to  find  the 
appropriate  entities. 

Our  emphasis  has  been  on  the  High  Level  Expert,  which  is  based 
on  a general  method  of  "evidence  accumulation"  to  perform  flexible 
spatial  reasoning.  This  paper  contains  a detailed  description  of  our 
evidence  accumulation  process  and  its  associated  consistency  checking 


process. 


2.  Motivation 


In  general,  two  different  types  of  information  can  be  used  to 
interpret  a pictorial  entity:  its  intrinsic  properties  (size,  shape, 
color  etc.)  and  its  relations  to  other  entities.  Our  primary 
interest  is  the  representation  of  geometric  relations  among  objects 
and  their  utilization  for  image  interpretation.  This  is  especial] y 
important  in  recognition  of  man-made  objects.  Moreover,  although 

shape  can  often  be  regards  as  an  intrinsic  object  property,  a complex 
shape  is  often  described  structurally  in  terms  of  geometric  relations 
among  its  components.  Thus  shape  recognition  often  requires  spatial 
analysis. 

Let  REL  (01,  02)  denote  a binary  geometric  relation  between  two. 
classes  of  objects,  01  and  02.  This  relation  can  he  used  as  a con- 
straint to  recognize  objects  from  these  two  classes  by  first  extract- 
ing pictorial  entities  which  satisfy  the  intrinsic  properties  of  01 
and  02,  and  then  checking  that  the  geometric  relation  is  satisfied  by 
these  candidate  objects  (Figure  1) . In  this  bottom-up  recognition 
scheme,  analysis  based  on  geometric  relations  cannot  be  performed 
until  pictorial  entities  corresponding  to  objects  are  extracted. 

In  general,  however,  some  of  the  correct  pictorial  entities 
often  fail  to  be  extracted  by  the  initial  image  segmentation.  So  one 
must,  additionally,  incorporate  top-down  control  to  find  pictorial 
entities  missed  by  the  initial  segmentation.  Such  top-down  processes 
use  geometric  relations  to  predict  the  locations  of  missing  objects, 
as  in  the  system  described  by  Self  ridge  [2] . 
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It  is,  of  course,  generally  accepted  that  image  understanding 
systems  should  incorporate  both  bottom- up  and  top-down  analyses.  As 
noted  above,  the  use  of  geometric  relations  is  very  different  in  the 
two  analysis  processes : consistency  verification  in  bottom-up  analysis 
and  hypothesis  generation  in  top-down  analysis.  An  important  charac- 
teristic of  our  evidence  accumulation  method  is  that  it  enables  the 
system  to  integrate  both  bottom-up  and  top-down  processes  into  a sin- 
gle flexible  spatial  reasoning  process.  As  will  be  described  later, 
the  system  first  establishes  local  environments.  Then,  either 
bottom-up  or  top-down  processes  are  activated  depending  on  the  nature 
of  the  local  environment.  The  following  sections  describe  the  con- 
cepts and  characteristics  of  this  process. 
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Representation  of  Geometric  Relations  and  Hypothesis  Formation 

3.1.  Functional  Representation  of  Relations 

A relation  REL(01,  02)  (01  and  02  are  object  classes)  is 
represented  using  two  functional  expressions: 

01  = f {02)  and  02  = g(01). 

Given  an  instance  of  02 , say  r,  function  f maps  it  into  a description 
of  an  instance  of  01,  f (r) , which  satisfies  the  geometric  relation, 
REL,  with  r.  The  analogous  interpretation  holds  for  the  other  func- 
tion g. 

In  our  system,  knowledge  about  a class  of  objects  is  reoresented 
by  a frame  [31 , and  a slot  in  that  frame  is  used  to  store  a function 
such  as  f or  g.  The  function  is  represented  by  a computational 
procedure  [which  produces  the  description  of  the  related  instance)  and 
a set  of  conditions  to  specify  when  that  function  can  be  activated. 
Whenever  an  instance  of  an  object  is  created,  and  the  conditions  are 
satisfied,  the  function  is  applied  to  the  instance  to  generate  a 
hypothesis  (expectation)  for  another  object  which  would,  if  found, 
satisfy  the  geometric  relation  with  the  original  instance.  The  func- 
tion can  use  any  properties  of  the  instance  to  create  the  hypothesis. 

A hypothesis  is  associated  with  a prediction  area  where  the 
related  object  instance  may  be  located  (Figure  2).  In  addition  to  this 
area  specification,  a set  of  constraints  on  the  target  instance  is 
associated  with  the  hypothesis.  Figure  3 shows  the  description  of  a 
road  hypothesis.  All  hypotheses  and  instances  are  stored  in  a common 
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Two  types  of  geometric  relations  are  used  in  our  system:  "spa- 
tial relation"  (SP)  and  "part-whole  relation"  (FW) . These  two  types  of 
relations  are  used  differently  by  the  system.  The  FW  relations 
specify  AND/OR  hierarchies  which  represent  objects  with  complex 
internal  structure.  The  SP  relations  represent  geometric  and  topo- 
logical relations  between  objects.  In  addition,  "A-kind~of 
relations"  (AKO)  are  used  to  construct  object  specialization  hierar- 
chies. 

There  are  several  restrictions  on  the  usage  of  these  types  of 
relations.  A hierarchy  defined  by  the  PW  relation  must  be  a tree 
structure.  Although  SP  relations  can  be  established  across  objects 
in  different  FW  hierarchies,  an  object  cannot  have  an  SP  relation 
with  another  object  in  the  same  FW  hierarchy,  nor  can  it  establish 
multiple  SP  relations  to  any  other  FW  hierarchy.  These  restrictions 
were  adopted  to  avoid  redundant  generation  of  hypotheses. 

Consider  the  knowledge  representations  shown  in  Figures  4(a)  and 
(b) . If  object  A had  an  SP  relation  to  object  B in  the  same  part- 
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whole  hierarchy  (Figure  4 (a))#  there  would  be  two  paths  from  object  A 
to  generate  a hypothesis  of  object  B:  one  by  the  SP  relation  and  the 
other  by  the  TO  relation.  This  means  that  if  an  instance  of  object  A 
were  constructed,  two  hypotheses  for  object  B would  be  generated  from 
the  same  instance.  The  same  argument  holds  in  the  case  shown  in  Fig- 
ure 4(b).  Figure  4(c)  shows  a circular  path  consisting  of  SP  rela- 
, tions  between  objects  A,  B,  and  C.  This  is  allowed  since  no  redundant 

hypotheses  are  formed. 

Hypothesis  generation  by  an  SP  relation  is  done  as  explained 
above,  i.e.,  when  an  object  is  instantiated  and  the  set  of  conditions 
needed  to  generate  a hypothesis  are  satisfied,  then  the  function 
associated  with  the  SP  relation  is  activated  to  produce  an  expecta- 
tion area  and  an  associated  set  of  constraints  for  a target  object. 
Although,  syntactically,  SP  relations  represent  binap/  relations,  it 
is  possible  to  use  them  to  represent  n-ary  relations.  For  example,  a 
left  eye  can  create  a hypothesis  for  a nose,  and  can  use  the  known 
location  of  a potential  right  eye  to  generate  the  nose  hypothesis. 

The  system  uses  TO  relations  both  to  group  parts  into  a whole 
and  to  predict  missing  parts.  If  an  instantiated  object  corresponds 
to  a leaf  node  in  the  TO  hierarchy,  then  it  can  directly  instantiate 
*'  (again,  if  prespecified  conditions  told)  its  parent  node  through  the 

TO  relation (Figure  5) . 

Objects  at  the  leaves  of  TO  hierarchies  are  instantiated  first, 

since  they  correspond  directly  to  low-level  image  structures.  The 

i 

presence  of  a higher  level  object  is  represented  by  an  instantiated 


TO  hierarchy.  The  parent  may  then  hypothesize  the  presence  of  other 
missing  object  parts.  For  computational  simplicity,  there  are  no 
hypotheses  generated  between  siblings  in  the  TO  hierarchy. 


4. 


Evidence 


4.1.  The  Interpretation  Cycle  of  the  High  level  Expert 

Figure  6 shows  the  organization  of  the  entire  system.  The  High 
Level  Expert  iterates  the  following  steps. 

{1)  Each  instance  of  an  object  generates  hypotheses  about  related 
objects  usinq  functions  stored  in  the  object  model  (frame) . 

(2)  All  pieces  of  evidence  (both  instances  and  hypotheses)  are  stored 

in  a cannon  database  (iconic  database).  They  are  represented  using  an 
iconic  data  structure  which  associates  highly  structured  symbolic 
descriptions  of  the  instances  and  hypotheses  with  regions  in  a two- 
dimensional  array. 

(3)  Pieces  of  evidence  are  combined  to  establish  situations . A 
situation  consists  of  consistent  pieces  of  evidence. 

(4)  Focus  of  attention  : since  there  are  many  situations,  the  most 
reliable  situation  is  selected. 

(5)  The  selected  situation  is  resolved,  which  results  either  in 
verification  of  predictions  on  the  basis  of  previously 
detected/constructed  image  structures  or  in  top-down  image  processing 
to  detect  missing  objects. 

The  system  also  has  two  additional  processes: 

(1)  Instantiation  of  objects  at  the  very  beginning  of  interpretation 
This  process  is  performed  by  the  Model  Selection  Expert  which 
searches  for  object  models  that  have  simple  appearances,  and  directs 
the  Low  Level  Vision  Expert  to  detect  pictorial  entities  which 
satisfy  the  appearances.  The  instances  constructed  by  this  process 


are  seeds  for  reasoning  by  the  High  Level  Expert. 

(2)  Selection  of  the  maximum  consistent  interpretation 
During  the  analysis  by  the  High  Level  Expert,  inconsistent  pieces  of 
evidence  may  be  constructed.  The  High  Level  Expert  maintains  all  pos- 
sible interpretations  throughout  the  search  process  until  no  further 
changes  are  made  in  the  iconic  database.  A final  interpretation  then 
selects  the  maximal  consistent  interpretation. 

The  following  subsections  provide  detailed  discussion  of  the 
operation  of  the  Hiqh  Level  Expert. 

4.2.  Overview 

Given  a set  of  instances  of  objects,  each  of  than  activates 
functions  to  generate  hypotheses  about  related  objects.  Each 
instance  and  hypothesis  is  represented  as  a region  in  the  iconic  data 
structure.  Suppose  instance  s creates  hypothesis  f (s) (based  on  rela- 
tion R)  for  object  class  01,  which  overlaps  with  an  instance  of  01, 
t (Figure  7(a)).  If  the  set  of  constraints  associated  with  f(s)  is 
satisfied  by  t,  these  two  pieces  of  evidence  are  combined  to  form 
what  we  call  a situation.  The  more  pieces  of  evidence  that  are  con- 
bined,  the  more  reliable  the  situation  becomes.  The  High  Level  Expert 
unifies  f(s)  and  t,  and  establishes  the  relation  R from  s to  t as  the 
result  of  resol vj ng  the  situation. 


On  the  other  hand,  a situation  may  consist  of  overlapping 
hypotheses , if  their  constraints  are  consistent  (Figure  7 (b) ) . Then 
their  unification  leads  the  expert  to  search  for  an  instance  of  the 


required  object  in  the  image.  The  High  Level  Expert  asks  the  Model 
Selection  Expert  to  detect  the  instance,  which  in  turn  activates  the 
Lew  Level  Vision  Expert.  If  the  instance  is  detected,  it  is  inserted 
into  the  database.  Hypothesis  generation  by  the  newly  detected 
instance  is  performed  at  the  next  interpretation  cycle. 

Handling  EW  relations 

Additional  complications  arise  from  resolving  situations  involv- 
ing instances  generated  via  Ftf  relations.  Suppose  s is  an  instance 
of  an  object  corresponding  to  a leaf  node  in  a EW  hierarchy  (Figure 
8(a)).  As  described  above,  it  may  instantiate  its  parent  object.  Let 
p denote  this  instance.  Then  p generates  a hypothesis  for  a missing 
part,  f (p) . If  there  is  already  an  instance  corresponding  to  the 
missing  part,  say  t,  f(p)  and  t will  be  unified,  and  a part-whole 
relation  will  be  established  between  p and  t.  However,  since  t is 
also  an  instance,  it  may  also  have  instantiated  its  parent  abject. 
Let  u denote  this  instance.  As  the  result  of  the  unification, 
instance  t has  two  parent  instances,  d and  u.  This  leads  the  High 
Level  Reasoning  Expert  to  another  unification.  The  expert  examines  p 
and  u,  and  i.  £ they  are  consistent,  it  unifies  them  (Figure  8(b)).  This 
unification  may  trigger  still  another  unification  for  higher  level 
instances  in  the  hierarchy.  Note  that  after  the  unification, 
instance  p can  use  properties  of  r and  t to  generate  hypotheses  for 
other  part  objects  whose  geometric  properties  could  not  previously  be 
specified  due  to  a lack  of  sufficient  information. 


Consistent  pieces  of  evidence  from  different  sources  are  com- 
bined into  situations.  The  consistency  among  pieces  of  evidence  is 
based  on; 

(1)  prediction  areas  of  hypotheses 

(2)  object  categories  of  evidence 

(3)  constraints  imposed  on  properties  of  hypotheses  and  instances 

(4)  relations  among  sources  of  evidence 

These  criteria  are  discussed  in  the  next  four  subsections. 

— Intersections  of  Prediction  Areas 

Figure  10  (a)  shows  all  intersections  formed  from  pieces  of  evi- 
dence El,  E2,  E3,  and  E4.  A partial  ordering  on  intersections  can  be 
constructed  on  the  basis  of  region  containment.  Intersection  OP1  is 
less  than  OP2  if  region  OP1  is  contained  in  region  OP2.  Figure  10(b) 
shews  the  lattice  representing  the  intersection  in  Figure  10  (a) . Each 
intersections  consists  of  seme  set  of  hypotheses  and  instance.  Situa- 
tions are  only  formed  among  intersecting  pieces  of  evidence. 


_4.4_._2.  Object  Categories  of  Evidence 


In  our  domain,  some  pairs  of  objects  cannot  occupy  the  same 
location  in  an  image.  For  instance,  a region  cannot  be  interpreted  as 
both  house  and  road  at  the  same  time  (although  it  could  be  inter- 
preted both  as  road  and  shadow) . Pairs  of  frames  representing  object 
classes  which  cannot  occupy  the  same  region  are  linked  with  an  in- 
conflict-with  relation. 
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Let  OP  be  the  intersection  arising  from  evidence  fEl,  E2l  and 
let  OBJ1  and  OBJ2  denote  the  object  categories  of  El  and  E2,  respec- 
tively. If  0BJ1  and  OBJ 2 are  linked  by  an  in-conflict-with  relation, 
then  El  and  E2  are  said  to  be  conflicting,  and  OP  is  removed  from  the 
lattice.  The  removal  of  OP  is  propagated  through  the  lattice,  and  any 
intersections  contained  in  OP  are  also  removed,  since  they  must  also 
have  arisen  from  conflicting  evidence.  To  find  all  conflicting 
intersections,  it  is  clearly  sufficient  to  examine  all  intersections 

containing  only  a pair  of  pieces  of  evidence  and  then  to  propagate 
the  results  through  the  lattice. 

In  the  above  case,  if  both  El  and  E2  are  instances,  the  High 
Level  Reasoning  Expert  records  than  as  conflicting  and  use  that  fact 
to  establish  the  inconsistency  of  situations  containing  hypotheses 
generated  by  conflicting  instances.  (See  section  4.4.4.) 

A shortcoming  of  our  approach  to  evidence  accumulation  is  that 
negative  sources  of  evidence  are  not  considered  in  assessing  the 
strength  of  a situation.  For  example,  in  medical  diagnosis,  some 
measurements  are  used  to  deny  the  possibility  of  certain  classes  of 
diseases.  Incorporation  of  sources  of  negative  evidence  is  an  impor- 
tant issue  for  future  research. 

_4.4.2-  Constraint  Consistency 

After  eliminating  all  conflicting  intersections,  the  remaining 
intersections  are  checked  to  determine  if  their  associated  sets  of 
constraints  are  consistent.  Let  El  and  E2  denote  the  non-conflicting 
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evidence  under  consideration.  One  of  the  following  conditions  must 
hold: 
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(1)  The  object  categories  of  El  and  E2  are  the  same, 

(2)  there  is  a path  between  the  two  categories  consisting  of  FW 
relations, 

or 

(3)  one  piece  of  evidence  is  a subcategory  of  the  other,  according 
to  the  specialization/generalization  hierarchy. 

In  the  second  case,  since  the  names  of  the  attributes  used  in 
the  constraints  associated  with  El  and  E2  may  be  different,  they  can- 
not, in  general,  be  directly  compared.  Suppose  the  object  category 
of  El  is  at  a higher  level  in  the  hierarchy  than  that  of  E2.  The  con- 
straints associated  with  E2  are  translated  into  those  for  the  object 
category  of  El  by  using  part-whole/a-kind-of  relations.  Then  the 
translated  constraints  are  compared  with  those  associated  with  El. 

Figure  11  illustrates  the  translation  of  constraints  using  FW 
relations.  Constraint  Cl  on  a road  piece  object  is  translated  into 
constraint  C2  on  a road  object.  Currently,  this  translation  is  done 
simply  by  rewriting  the  attributes  (slot  names)  of  Cl  into  appropriate 
attributes  (slot  names)  of  C2  using  a "slot  name  translation  table” 
for  the  FW  relation  (Figure  11.  b) . 

The  properties  and/or  constraints  associated  with  both  pieces  of 
evidence  must  be  consistent.  Both  constraints  associated  with  a 
hypothesis  and  properties  associated  with  an  instance  are  represented 
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by  sets  of  linear  inequalities  in  one  variable.  A simple  constraint 
manipulation  system  is  used  to  check  the  consistency  between  the  sets 
of  inequalities  by  generating  the  solution  space (also  represented  by 
inequalities)  to  the  intersection  of  sets.  If  this  solution  space  is 
empty,  then  the  constraints  are  inconsistent.  If  Cl  are  the  con- 
straints for  El,  C2  for  E2,  and  C for  0,  the  object  category  to  which 
both  El  and  E2  belong,  then  we  must  check  that 

(cinc2)  and  (W 

We  do  this  by  first  computing  C3  = Cl  HC2,  and  if  this  is  non-empty, 
finally  computing  C3  and  C. 

4 ,4 ,_4 . Relations  Between  Sources  of  Evidence 

The  sources  of  accumulated  evidence  about  a situation  must  not 
be  conflicting.  Let  SI  and  S2  denote  the  source  evidence  of  El  and 
E2,  respectively.  If  a piece  of  evidence  is  a hypothesis,  its  source 
evidence  is  the  instance  which  generated  the  hypothesis.  An  instance 
is  the  source  evidence  for  itself.  It  is  possible  that  SI  and  S2  are 
mutually  conflicting  (Figure  12) , but  that  El  and  E2  themselves  are 
consistent.  In  such  a case,  we  do  not  combine  El  and  E2  into  a 
situation;  analysis  based  on  such  conflicting  interpretations  is  per- 
formed independently. 

■l*^*  of  Attention 

After  examining  the  consistency  among  evidence,  we  next  evaluate 
the  reliability  of  each  consistent  situation  by  summing  numerical 
reliability  measures  for  each  piece  of  evidence,  Mid  select  the  most 


Recall  that  there  are  two  different  types  of  evidence  in  our 
system:  instances  and  hypotheses.  It  is  possible  to  control  the 
direction  of  the  interpretation  orocess  by  assigning  different  relia- 


bilities to  them. 

If  a higher  reliability  is  assigned  to  an  instance  than  to  a 
hypothesis,  a situation  including  an  instance  tends  to  be  selected  as 
the  most  reliable  one  rather  than  one  consisting  only  of  hypotheses. 
Therefore  the  system  first  builds  partial  interpretations  by  estab- 
lishing relations  among  instances  before  trving  to  perform  top-down 
picture  processing. 
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J5.  Resolving  a Situation 

As  described  in  Section  4,2,  one  of  two  actions  is  taken  in 
order  to  resolve  a situation:  confirm  relations  between  instances  or 
activate  top-down  analysis. 

How  a situation  is  resolved  depends  on  the  nature  of  its  consti- 
tuent evidence.  If  the  pieces  of  evidence  are  all  hypotheses,  then  a 
composite  hypothesis  is  constructed  for  transmittal  to  the  MSE,  and 
any  instance  extracted  from  the  image  is  then  examined  by  the  source 
instances  of  those  hypotheses.  If  a situation  includes  both 
hypotheses  and  instances,  then  the  instances  are,  in  turn,  examined 
by  the  sources  of  the  hypotheses,  and  if  none  satisfy  the  hypotheses, 
then  a composite  hypothesis  can,  in  turn,  be  transmitted  to  the  MSE. 

J5. 1.  Resolution  Process 

The  system  provides  a description  of  its  proposed  resolution  to 
a situation  to  all  instances  involved  in  that  situation.  Each 
instance  then  evaluates  the  proposed  solution  according  to  its 
specific  expectations. 

In  what  follows,  the  process  of  resolving  a situation  is  illus- 
trated by  the  example  shown  in  Figure  13.  Suppose  the  consistency 
reasoner  selected  the  overlapping  region  between  two  hypotheses  gen- 
erated from  two  road-piece  instances  RPl  and  RP2  (Figure  13  (a) ) . In 
the  symbolic  data  structure,  RPl  and  RP2  are  linked  to  their  oarent 
road  instances  RD1  and  RD2  by  PW  relations,  respectively.  The 
hypotheses  for  adjacent  road  pieces  have  been  generated  by  these 
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parent  instances. 


Since  this  situation  consists  only  of  hypotheses,  the  system 
activates  top-dcwn  analysis  to  find  a road  piece  in  the  overlapping 
region.  This  request  is  issued  to  the  Model  Selection  Expert 
together  with  the  supporting  evidence (i.e.  RD1  and  RD2) , so  that  the 
expert  can  use  any  available  contextual  information. 


Assume  that  a new  road-piece  instance,  RP3,  is  created (Figure 
13(b)).  Then,  the  system  provides  this  result  to  the  instances 
involved  in  the  situation,  namely  RDl  and  RD2. 


Suppose  HD1  is  the  first  to  be  informed  of  the  proposed  resolu- 
tion. RD1  examines  whether  or  not  RP3  satisfies  all  constraints 
required  to  establish  relation  HI.  In  this  case,  however,  RP3  fails, 
because  RP3  is  not  adjacent  to  RPl.  mhis  failure  activates  an  excep- 
tion handler , which  issues  a top-down  request  to  find  a road-piece 
between  RPl  and  RP3 (see  Figure  13 (c) ) . 


Assume  that  another  new  road-piece  instance,  RP4,  is 
detected (Figure  13(d)).  Since  RP4  is  adjacent  to  RPl,  RDl  establishes 
a PW  relation  to  RP4,  and  then  to  RP3. 


Figure  13(e)  shows  the  data  organization  after  the  same  analysis 
is  oerformed  by  RD2.  In  this  case,  however,  when  RD2  establishes  a FW 
relation  to  RP3,  an  exception  handler  in  RP3  is  triggered,  because 
RP3  has  two  different  parents.  More  specifically,  after  RD2  estab- 
lishes a FW  relation  to  RP3,  RD2  asks  RP3  to  check  its  reverse  rela- 
tion from  RP3.  An  exception  handler  is  activated  as  a result  of  this 
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checking  process.  This  handler  issues  a request  to  the  system  to 
examine  the  consistency  between  two  parents.  If  they  are  consistent, 
the  system  merges  the  two  EW  hierarchies  below  them  into  one  (Figure 
13(f)).  An  exception  handler  of  this  kind  is  associated  with  each  EW 
relation  in  order  to  construct  a complete  EW  hierarchy  by  merging  a 
pair  of  partial  hierarchies. 

There  are  several  stages  in  the  above  example  where  the  top-down 
request  might  have  failed.  In  general,  the  Model  Selection  Expert  has 
the  ability  to  deal  with  such  failures.  Figure  14  shews  a partial, 
knowledge  structure  for  suburban  scenes.  The  Model  Selection  Expert 
analyzes  the  request  to  find  RP3  (Figure  13  (a) ) by  first  assuming  the 
road  piece  to  be  detected  is  a visible  road,  and  issues  a request  to 
the  Low  Level  Vision  Expert.  If  this  request  fails,  the  Model  Selec- 
tion Expert  switches  to  the  ocher  appearance  of  a road  piece,  i.e.  an 
occluded  road.  The  selection  between  overpass  and  shadowed  road  is 
done  based  on  the  cause  of  the  failure.  For  example,  if  the  cause  of 

the  failure  is  that  the  gray  level  in  the  overlapping  region  is  too 
dark  compared  to  the  expected  gray  level,  then  the  expert  will 
hypothesize  a shadowed  road.  If  all  efforts  by  the  Model  Selection 
Expert  fail,  this  is  reported  to  the  High  Level  Expert.  Then  the  sys- 
tem reports  this  to  RD1  and  RD2,  which  trigger  their  relevant  excep- 
tion handlers.  Since  different  new  hypotheses  may  be  generated  by 
such  exception  handlers,  no  immediate  further  analysis  is  activated. 
Instead,  these  hypotheses  are  combined  in  the  next  interpretation 
cycle.  In  the  case  of  Figure  13,  RD1  and  RD2  would  both  generate 
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hypotheses  for  a road  terminator. 

If  a top-down  request  issued  by  an  instance  fails,  the  instance 
activates  another  exception  handler,  if  anv.  If  all  trials  fail,  the 
instance  reports  this  to  the  system.  Then  the  system  activates 
another  instance  involved  in  the  focused  situation.  The  initial 
failure  is  not  taken  into  account  in  anv  way  by  the  system,  this  is  a 
shortcoming  of  the  present  system. 

5.2.  Merging  a Pair  of  Partial  EW  Hierarchies 

If  a part  instance  is  shared  fey  two  parent  instances,  the  part 
issues  a request  to  check  the  ''similarity”  between  the  parents.  If 
they  are  similar,  the  system  merges  them  into  one. 

Similarity  examination  involves  checking  whether  or  not  the  two 
parent  instances  denote  (perhaps  different  pieces  of)  the  same  object. 
For  example,  RD1  and  RD2  in  Figure  13(e)  should  be  merged  into  one 
road,  although  they  do  not  denote  the  same  (portion  of)  road. 
Knowledge  about  the  continuity  of  roads  is  crucial  in  this  example. 

The  more  reliable  of  the  two  instances  to  be  merged  checks 
whether  or  not  the  part  instances  of  the  other  instance  are  con- 
sistent with  that  more  reliable  parent.  The  more  reliable  parent  may 
decide  to  merge  with  the  other  parent,  that  such  a merge  is  not  (and 
will  never  be)  possible  (which  places  than  in  conflict)  or  that  suffi- 
cient information  is  not  available  to  make  a decision. 

Figure  15  illustrates  an  example  of  the  third  case.  The  defini- 
tion of  a house  group  is  a group  of  regularly  arranged  houses  which 
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face  the  same  side  of  the  same  road.  As  shown  in  Figure  15,  if  two 
house  qroup  instances  share  a house  instance,  the  similarity  examina- 
tion is  performed.  If  both  house  group  instances  face  the  same  side 
of  the  same  road  instance,  then  they  are  similar  and  are  merged  into 
one.  On  the  other  hand,  if  one  of  them  has  not  established  such  a 
"facing"  relation,  then  it  is  not  possible  to  verify  the  similarity 
between  than.  Moreover,  even  if  the  two  house  group  instances  have 
established  "facing"  relations  to  different  road  instances,  it  is 
still  possible  for  than  to  be  similar,  because  those  road  instances 
may  be  merged  later.  “The  house  group  instances  can  be  regarded  as 
conflicting  only  if  their  facing  road  instances  are  in  conflict. 

If  the  result  of  the  similarity  examination  is  "inconclusive", 
the  system  records  the  causes  of  the  failure  and  suspends  the  action 
of  establishing  a new  PW  relation  frcm  a parent  instance  to  the 
shared  part  instance.  In  the  case  shown  in  Figure  15,  the  relation 
between  HG1  and  H3  is  suspended.  The  system  records  all  suspended 
actions  together  with  their  causes.  The  suspended  action  can  be 
reactivated  if  its  cause  is  resolved  by  analysing  other  situations. 
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6.  Experimental  Results 

The  image  used  in  our  experiment  is  a 320  bv  160  portion  of  an 
aerial  photograph  (Figure  16)  with  intensities  in  the  range  of  0 to 
63.  The  scene  contains  houses,  roads,  road  intersections,  trees,  and 
driveways. 

The  appearance  models  are  a subset  of  the  possible  models  for 
suburban  housing  developments . Currently,  we  deal  only  with  houses, 
road  pieces,  road  intersections , and  the  spatial  relations  amonq 
them.  Figure  14  shews  the  suburban  housing  development  model  used.  In 
this  section,  we  describe  how  our  system  proceeds  to  construct  a road 
network  interpretation  frem  the  image. 

The  systems  analysis  starts  with  a segmentation  of  the  image. 
Since  the  houses  and  road  pieces  are  modeled  by  compact  and  elongated 
rectangles,  such  rectangles  an*  first  extracted  from  the  image.  A 
simple  blob  finder  and  ribbem  finder  are  used  to  find  blobs  and  rib- 
bons in  the  image. 

Elongated  rectangles  are  extracted  and  instantiated  as  road 
piece  instances.  These  instances  constitute  the  initial  entries  in 
the  iconic  database.  Figures  16  shows  the  initial  road-piece 
instances  extracted  from  the  image.  As  can  be  seen,  roads  are  broken 
into  pieces. 

In  the  first  cycle  of  the  interpretation  cycles,  the  system 
checks  each  instance  and,  for  each  relation,  creates  a hypothesis (for 
an  SP  relation  or  a top-down  usage  of  a M relation)  or  an 
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instance  (for  a bottom-up  usage  of  a EW  relation),  if  possible,  and 
inserts  it  into  the  database.  Since  some  of  the  relations  may  depend 
on  yet  undetermined  values  stored  in  frame  slots,  not  all  relations 
may  be  hypothesized  at  this  point. 

In  the  second  cycle,  the  system"* s focus  of  attention  mechanism 
selects  the  most  premising  situation.  After  a situation  is  selected, 
the  systan  resolves  it  by  first  proposing  a solution  to  it  and  then 
broadcasting  massages  to  the  source  instances.  Each  source  instance 
checks  the  proposed  solution  and  requests  the  MSE  to  do  top-down 
analysis  if  necessary.  Also,  the  system  may  reorganize  the 
database (e.g. , unification  of  instances)  during  the  resolution  pro- 
cess. 

In  the  current  experiment,  the  MSE  is  simulated  by  a human.  The 
descriptions  of  the  action  and  the  situation  are  displayed  on  the 

screen.  The  description  of  the  result  is  entered  from  the  terminal 
and  is  instantiated  as  an  object  instance  and  returned  to  the  systan. 

Figures  17  - 23  show  how  the  system  proceeds  to  select  a situa- 
tion, resolve  the  selected  situation,  and  reorganize  the  database  as 
the  result  of  resolving  that  situation.  Figures  17  and  18  shew  twn 
road-piece  instances  RP1,  RP2,  their  parent  instances  EDI,  KD2,  and 
the  hypotheses  that  EDI  and  ED 2 generate.  During  the  hypothesis 
creation  cycle,  instances  EDI  and  ED2  create  hvpotheses  HI,  ...  , H8. 
Hypotheses  H4  and  EP2  overlap  (Figure  19.  a) . The  systan  picks  this 
situation (H4  and  RP2  are  consistent)  and  proceeds  to  resolve  it. 


Let  C be  the  summarized  constraints  derived  f ran  the  constraints 
of  H4  and  RP2.  Since  RP2  satisfies  the  constraint  C,  the  system  uses 
it  as  a proposed  answer  . EDI  checks  the  proposed  solution,  RP2,  for 
adjacency.  However,  RP2  is  not  adjacent  to  EDI.  EDI  issues  a top 
down  request  to  the  MSE  to  find  a road  piece  instance  to  connect  RD1 
and  RP2.  Currently,  such  a request  is  displayed  on  the  screen  and 
the  result  is  entered  from  the  terminal.  The  result  can  either  be 
success , in  which  the  descript ior.  of  the  instance  (object  type  and 
region  description)  is  entered,  or  failure. 

The  description  of  a road  piece  instance  (RP3)  is  entered  trcm 
the  terminal.  MSE  instantiates  the  instance  and  inserts  it  Into  the 
database.  MSE  reports  RP3  to  RD1.  EDI  checks  if  RP3  is  adjacent  to 
EDI.  Since  RP3  is  adjacent  to  EDI,  EDI  establishes  a Ptf  link  to 
RP3  (Figure  20. b).  Finally,  EDI  checks  MSG-A  again  and  succeeds  (since 


EDI  contains  RPl  and  8P3.)  A FW  lint  is  established  between  RP2  and 
RD1  (Figure  20 .c).  As  a result,  RP2  bel'  :gs  to  two  parents.  The  system 
tries  to  unify  them  by  checking  if  EDI  and  HD2  are  similar . this 
case,  they  are  similar.  The  system  unifies  EDI  and  RD2  into  a ngle 
instance  (say  ED'.)  After  the  unification,  road  instance  ED'  has  three 
parts  (RPl,  RP2,  and  RP3) . Figure  21  shows  the  road  instance  ED'  and 
its  three  parts.  Figure  22  shows  all  the  road  instances  after  the 
selected  situation  is  resolved. 

During  the  unification  process,  several  instances  are  merged 
into  a single  instance.  The  hypotheses  generated  by  the  merged 
instances  are  removed  from  the  database.  A new  set  of  hypotheses  is 
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generated  in  the  next  hypothesis  creation  cycle.  Figure  23  shows  the 
new  hypotheses  generated  by  ED' . Note  that  the  original  hypotheses 


HI,  ...  , H8  generated  by  EDI  and  RD2  have  been  removed  from  the 
database. 

Figure  24  shews  a case  where  alternate  hypotheses  are  generated. 
A road  can  either  be  extended  continuously,  or  stop  at  a road  termi- 
nator. One  way  to  conduct  the  search  is  to  look  for  the  adjacent  road 
piece  first.  If  that  search  fails,  then  the  search  for  a road  termi- 
nator can  start.  Such  a strategy  is  illustrated  in  Figure  24. a. 
Figure  24. b shows  a road  instance  and  the  alternate  hypotheses  it 
generates  during  the  process. 

Figure  2 5. a shows  idle  final  result  of  constructing  the  road  net- 
work interpretations  by  the  system.  The  interpretation  graphs  are 
shown  in  Figure  25. b.  Each  node  represents  an  instance.  There  are  29 

road  piece  instances,  10  road  instances,  and  5 road  terminator 
instances.  Figure  26  shows  the  road  joint  instance  J1  and  all  road 
instances  meeting  there.  Figure  27  shews  road  instance  R2,  the  road 
terminator  instances  adjacent  to  it,  and  its  part  objects. 
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Hypothesis  Generotion 


instonce  of  object  02 


hypothesis  for  object  01 


Fig.  2 Hypothesis  generation  based  on  functional 
representation  of  a relation 


Frame  name  : Road  piece 
Slot  name  : Length 
Width 
Direction 

Coordinate  of  the  local  coordinate  system 
Father 


(1)  The  description  of  the  road  piece  frame 


Frame  name  : Road 

Slot  name  : Total-length 

Average-direction 
Left-adjacent- road-piece 
Right-ad j acent-road-piece 
Left-connecting-road- terminator 
Right-connecting-road- terminator 
Left-neighbor ing-house-group 
Right-neighbor ing-house-group 


(2)  The  description  of  the  road  frame 


Figure  3 : (a)  The  description  of  the  road  frame  and  the 

road  piece  frame. 


(X)  Iconic  description  of  hypothesis  H 


(AND  (EQUAL  OBJECT-TYPE  ROAD) 

(AND  (LESSP  TOTAL-LENGTH  100) 

(GREATERP  TOTAL-LENGTH  50)) 

(AND  (LESSP  AVERAGE-WIDTH  15) 

(GREATERP  AVERAGE-WIDTH  10)) 

(AND  (LESSP  AVERAGE-DIRECTION  50) 

(GREATERP  AVERAGE-DIRECTION  30))) 


(2)  Symbolic  description  of  hypothesis  H 


Figure  3 : (b)  The  description  of  a road  hypothesis  H 
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An  Image  Understanding  System 
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function  h 


instcnce  of  object  03 


(b) 


Fig.  7 Principle  of  evidence  accumulation  for 
reasoning 


spatial 


Pictorial  Entity 


Pictorial  Entity 


Fig.  9 (b)  Result  of  the  unification 


(AND  (EQUAL  OBJECT-TYPE  ROAD-PIECE) 
(AND  (LESSP  LENGTH  19) 

(GREATERP  LENGTH  14)) 

(AND  (LESSP  DIRECTION  60) 

(GREATERP  DIRECTION  45)  ) ) 

(a)  The  description  of  constraint  Cl. 


Slot  name  translation  table 


Slot  name  of 

road-piece  frame 

Slot  name  of 
road  frame 

Length 

Total-length 

Width 

Average-width 

Direction 

Average-direction 

(b)  Slot  name  translation  table  for  the  PW  relation 
betwevn  the  road  frame  and  the  road  piece  frame. 


(AND  (EQUAL  OBJECT-TYPE  ROAD) 

(AND  (LESSP  AVERAGE-LENGTH  19) 

(GREATERP  AVERAGE-LENGTH  14)) 

(AND  (LESSP  AVERAGE-DIRECTION  60) 

(GREATERP  AVERAGE-DIRECTION  45))} 


(c)  The  description  of  constraints  Cl  after  translation. 


Figure  11  : Translation  of  constraints. 


conflicting 

instances 
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(c)  The  interpretation  graph  of  RD1 
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A road  piece  instance  RP1,  its 
the  hypotheses  RDl  generates. 


parent  RDl,  and 


fa)  A road  instance  RDl (bottom),  the  neighboring  house  group 
hypotheses (HI , H2) (middle) , and  the  adjacent  road  piece 
hypotheses (H3 , H4)  . 


(b)  A depiction  of  RDl  and  the 


hypotheses 


generates 


(b)  A depiction  of  RDl  and  the  hypotheses  it  generates 
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(a)  A road  instance  RD2 (bottom),  the  neighboring  house  group 
hypotheses (H5 r H6)  (middle) , and  the  adjacent  road  piece 
hypotheses (H7 r H8) • 


(c)  The  interpretation  graph  of  RD2 . 

Figure  18  : A road  piece  instance  RP2,  its  parent  RD2 , and 
the  hypotheses  RD2  generates. 
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(a)  A depiction  of  the  situation 


(b)  The  supporting  sources  of  the  s ituation (bottom) , the 
region  of  top-down  prediction  request (middle ) , the 
road  piece  instance  entered  from  the  terminal  (top) . 


Figure  19  s A situation 


(a)  The  interpretation  graphs  before  resolving  the  situation. 


(b)  The  interpretation  graphs  after  road  piece  RP3  is  entered 
into  the  iconic  database. 


{c)  The  interpretation  graph  after  RDl  rechecks  its  message. 


{d)  The  interpretation  graph  after  the  unification  of  RDl  and 
RD2. 

Figure  20  t The  interpretation  graphs  during  the  resolution 
of  a situation. 
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igure  21  : Resolving  a situation.  Road  instance  RD3 (bottom) 
and  its  part  objects{RPl,  RP2,  and  RP3) (top) . 


igure  22  : All  road  instances  after  the  situation 
is  resolved. 
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Figure  23 


Update  of  hypotheses  : road  instance  RD3(bot 
neighboring  house  group  hypotheses (middle) , 
adjacent  road  piece  hypotheses (top)  . 
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(b)  A road  instance (bottom) , adjacent  road  piece 

hypotheses (middle) f and  adjacent  road  terminator 
hypothesis  (top)  . 

Figure  24  : Change  of  hypotheses. 
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(a)  All  road  piece  instances (bottom) , all  road 
instances (middle) , and  all  road  terminator 
instances (top)  . 


Figure  25  : Final  interpretation  of  the  road  network. 
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- igure  27  ; Road  instance 


instance  R2  (bottom 

instances  adjacent  to  it  (middl.r*r?l2*t0r  inst3nc 
instances  contained  in  it*r*-oD)"  ' 3nc2  caa<^  piece 


Road  joint 
instances 


instance  Ji (bottom)  and 
intersecting  at  Jl(top). 


all  road 


Figure  26  : 
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ABSTRACT 

Texture  is  an  important  image  characteristic,  and  a variety  of 
spatial  domain  techniques  have  been  proposed  for  extracting  and 
utilizing  textural  features  for  segmenting  and  classifying  images. 
For  the  most  part,  these  spatial  domain  techniques  are  ad  hoc  in 
nature,  in  this  paper,  we  discuss  a raarkov  random  field  model  for 
image  texture,  and  derive  a frequency  domain  description  of  image 
texture  in  terms  of  the  power  spectral  density.  This  model  can  be 
used  for  designing  optimum  frequency  domain  filters  for  enhancing, 
restoring  and  segmenting  images  based  on  their  textural  properties. 

I.  INTRODUCTION 

Image  texture  is  made  up  of  two  components:  a set  of  primitive 

elements  and  a structural  arrangement  [1  , 2,  3,  41  • For  example,  in 
a photographic  image  of  a residential  area  the  primitive  elements 
are  roads,  houses,  and  trees,  and  the  structural  arrangement  is  the 
lay  out  of  the  area,  with  an  appropriate  set  of  primitive  elements 
and  statistical  models  for  the  structure,  it  is  possible  to  describe 
image  texture  as  a markov  random  process,  and  derive  a frequency 
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domain  model  in  the  form  of  power  spectral  densities  * The  power 
spectral  densities  can  be  used  to  derive  frequency  domain  algorithms 
for  processing  texture  information. 

For  purposes  of  illustrating  the  modeling  approach  that  can  be 
used,  consider  the  one  dimensional  textural  pattern  shown  in  Figure 
1.  This  texture  is  made  up  of  two  primitive  elements,  rectangular 
and  triangular  in  shape  with  the  heights  representing  tonal 
variations.  The  location  of  the  primitive  elements  is  given  by  the 
sequence  {t^}.  We  can  model  this  class  of  textural  patterns  as 

00 

x(t)  = v a.  p (t  - t. ) (i ) 

u i m.  l 

-«  x 

where 

{A±)  s*  amplitude  sequence, 

{t^}  : Location  of  the  i-th  primitive, 

P1  , p2,  ••*,  are  N primitive  elements,  and 

{m.}  ; e [1,  2,  •••,  Nl  indicates  which  one 

of  the  N primitives  is  present  at  the  i-th  location. 


The  complexity  of  the  textural  patterns  and  their  models  depend  on 

the  nature  of  (t. },  {a.},  {m.}  and  {p.}  and  the  following  assump- 
111  1 

tions  can  be  made  about  these  sequences: 


1 .a. 

l.b. 

2 . a. 

(V 

2.b. 

(\l 

uniformly  distributed  locations  or 

Poisson  sequence  with  an  exponential 
distribution  for  inter- location  distance 

Constant 

i.i.d.  sequence  of  random  variables,  or 


-2- 


‘ '■& 


2 iC, 

{a^  5 

3. a. 

P'1  ' P2 ' 

3.b. 

Pi'  P2' 

4. a. 

KI  : 

{“il 

2. 

POWER  £ 

2.  POWER  SPECTRAL  DENSITY  OP  MARKOVIAN  TEXTURE  FIELDS 


If  we  assume  that 


{t^}  : uniformly  distributed 
{a^}  : constant 

P1 , p 2,  E^j:  deterministic 

(nu } : markov 

p {occurrence  of  p } = ir 
K K 

p( occurrence  of  p.  followed  by  p.  after  n locations) 
x j 


Pij 


then  the  texture  can  be  described  by  the  markov  random  field 
(process) 


X(t)  = y p (t  - mT  ) 
m,  s 

-00  i 


(2) 


where 


Ts  = average  spacing  between  elements 


It  can  be  shown  [5,  6]  that  the  power  spectral  density  of  x(t) 
is  given  by 
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Gx(f>  I I I *j  Mi-H*  «« 

x <p  * n=-“  j=l  3 3 s 

s 


N 


-n  . \2 


*f  ! ”, Tsj'(£)|2 

s j=1  J J 


+ ST  Re  t I . I *,  s.'Hl)  S '(f)  Q (f)} 
Ts  j=1  k=1  3 3 3 


where 


Sk(f)  = F{pk(t)} 


M 

S '(f)  = F{p  (t)  - l 
k k j=1 


TT.p  . (t)  } 

3 j J 


and 


(f) 


n=1 


vn> 


exp(-i  2 tr  n fT^) 


By  letting  t = (x,  y) , f = (fx,  fy),  p(t)  = p(x,  y),  s(f)  = 
S(£x,  fy),  the  model  can  be  extended  to  the  two  dimensional  case. 
It  can  also  be  generalized  to  the  cases  where  {t^},  {a^}  and 
{p^}  satisfy  the  assumptions  mentioned  in  the  previous  section. 

3.  DISCUSSION 


A generalized  markov  model  for  image  texture  can  be  derived 
using  the  formulation  given  in  this  paper.  Frequency  domain 
properties  of  textural  patterns  can  be  obtained  from  the  model  and 
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the  power  spectral  density  can  be  used  to  develop  frequency  domain 
algorithms  for  extracting  and  processing  textural  information. 
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ABSTRACT 


Rectification  of  single  and  overlapping  multiple  scanner  frames  is  carried  out  using  a 
newly  developed  comprehensive  parametric  model.  Tests  with  both  simulated  and  real 
image  data  have  proven,  that  this  model  in  general  is  superior  to  the  widely  used  poly- 
nomial model;  and  that  the  simultaneous  rectification  of  overlapping  frames  using  least 
squares  techniques  yields  a higher  accuracy  than  single  frame  rectification  due  to  the 
inclusion  of  tie  points  between  the  image  frames.  Used  as  control,  edges  or  lines,  which 
are  much  more  likely  to  be  found  in  images,  can  replace  conventional  control  points 
and  can  easily  be  implemented  into  the  least  squares  approach.  An  efficient  algorithm 
for  finding  corresponding  points  in  image  pairs  has  been  developed  which  can  be  used 
for  determining  tie  points  between  image  frames  and  thus  increase  the  economy  of  the 


whole  rectification  procedure. 
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1.  INTRODUCTION 
1.1  General 

Imaging,  using  scanners  as  sensors,  yields  the  sensed  data  about  the  object  in  the 
form  of  pixels.  Knowledge  of  the  relative  and/or  absolute  locations  of  these  pixels  in 
the  object  space  is  necessary  for  mapping,  classification,  and  change  detection  or  moni- 
toring. Of  primary  interest  is  scanner  imagery  of  the  surface  of  the  earth.  The  process 
of  finding  the  location  of  pixels  on  the  ground  for  this  type  of  imagery  is  called 
rectification.  If  the  reference  is  another  image,  the  process  is  known  as  registration. 
This  research  covers  rectification  and  registration  of  scanner  imagery  produced  by 
satellite-borne  scanners  such  as  LANDSAT  MSS  imagery.  An  important  element  of 
this  research  concerns  correspondence  between  two  images  or  between  an  image  and  a 
representation  of  the  terrain  (i.e.  a map). 

If  the  position  of  the  sensor  platform  (i.e.  satellite)  and  the  attitude  of  the  sensor  at 
the  moment  of  sampling  a given  pixel  is  known,  and  if  the  interior  geometry  of  the 
scanner  at  the  same  instant  can  be  reconstructed,  then  the  ground  position  of  a pixel 
can  be  derived  with  some  assumptions  regarding  the  shape  of  the  terrain.  The  satellite 
position  can  be  derived  from  satellite  tracking  data.  The  sensor  attitude  can  be  sup- 
plied by  attitude  sensors  on-board  the  satellite.  The  geometry  of  the  sensor  is  recon- 


structed  using  calibration  data  and  the  imagery.  Unfortunately,  the  accuracy  of  the 
satellite  position  and  sensor  attitude  measurements  is  not  sufficient  to  produce  sub- 
pixel rectification  accuracy. 

An  alternative  method  for  rectifying  satellite  scanner  imagery  is  through  the  use  of 
control  information.  Control  can  be  in  the  form  of  points  or  edges  with  known  ground 
and  image  locations.  In  this  method,  given  a suitable  mathematical  model,  the  parame- 
ters needed  for  relating  the  image  positions  of  pixels  to  their  ground  positions  are  first 
computed  using  control  points  and  applying  an  appropriate  adjustment  procedure. 
Then  the  ground  positions  of  pixels  are  computed  using  the  same  model  and  the 
derived  parameters.  The  same  method,  with  slight  modifications,  can  also  utilize  edges 
as  control  instead  of  points. 

The  above  method  can  be  further  subdivided  into  two  approaches.  The  first  is  the 
interpolative  or  the  surface  fitting  approach.  This  approach  uses  a mathematical  series 
(e.g.,  polynomial,  harmonic)  to  approximate  the  true  mathematical  model  relating  the 
image  position  of  pixels  to  their  corresponding  ground  position.  This  approach  requires 
an  excessive  number  of  control  points  for  uniform  rectification  accuracy. 

The  second  approach  is  commonly  called  ’’parametric”.  In  this  approach,  the 
mathematical  model  used  is  based  on  the  geometry  of  the  imaging  process.  Because  of 
this,  it  is  possible  to  develop  highly  accurate  models.  However,  usually  simplifying 


assumptions  are  made  to  make  the  resulting  model  traetable,  since  the  geometry  of  the 
satellite  scanner  image  is  very  weak.  In  this  approach,  it  is  possible  to  exploit  a-priori 
knowledge  of  the  satellite  position  and  sensor  attitude,  effectively  combining  the  two 
main  methods  discussed  above. 

Both  methods  mentioned  are  normally  used  for  rectifying  single  frames  of  satellite 
scanner  imagery.  This  requires  that  some  assumptions  be  made  regarding  the  shape  of 
the  terrain  covered  by  one  image  frame.  Improvement  in  accuracy  can  be  gained  if 
overlapping  frames  of  imagery  are  rectified  simultaneously  in  a procedure  commonly 
known  as  block  adjustment. 


1.2  Review  of  the  Literature 

The  earliest  approach  to  rectification  utilized  interpolative  or  surface  fitting  models 
such  as  polynomials.  This  model  is  easy  to  implement  and  gives  results  comparable  to 
most  early  forms  of  the  parametric  model  for  satellite  imagery  (Forrest  [13],  Trinder 
[42],  Bahr  [l],  Dowman  [10]). 

The  parametric  model  based  on  the  geometry  of  the  scanner  imaging  process  has 
many  variations  depending  on  the  simplifying  assumptions  made.  The  simplest  model, 
which  is  really  designed  for  aircraft  scanner  data,  assumes  that  the  satellite  orbit  is  a 
straight  line  and  that  the  earth  is  projected  onto  a mapping  plane  (Kratky  [23], 


Konecny  [21],  Dowman  [10]).  The  orbit  of  the  satellite  has  been  assumed  to  be  a circle 
(Forrest  [12],  Levine  [26],  Synder  [41])  or  an  ellipse  (Bahr  [2],  Sawada  [39]).  The  earth 
has  been  assumed  to  be  a sphere  (Caron  and  Simon  [9],  Bahr  [2],  Sawada  [39])  or  an 
ellipsoid  of  revolution  (Puccinelli  [36],  Forrest  [12],  Levine  [26],  Synder  [41]). 

The  satellite  orbit  and  position  can  be  defined  simultaneously  in  terms  of  the  satel- 
lite position  and  velocity  vectors  (Caron  and  Simon  [9],  Puccinelli  [36]).  The  position  of 
a satellite  along  an  assumed  orbit  can  be  defined  in  terms  of  time  varying  orbital 
parameters  (Bahr  [2]).  Alternatively,  the  orbital  parameters  can  be  assumed  constant 
which  results  in  an  ideal  orbit.  Small  deviations  of  the  actual  satellite  position  from 
the  ideal  are  then  modeled  as  arbitrary  functions  of  time,  usually  a polynomial  series 
(Levine  [26],  Mikhail  and  Paderes  [32]). 

The  sensor,  without  the  scanning  action,  is  nominally  pointed  along  the  vertical. 
Small  deviations  of  the  sensor  attitude  with  respect  to  the  vertical  are  modeled  as  poly- 
nomial functions  of  time.  Bridging  of  long  strips  with  control  at  each  end  only  is  feasi- 
ble through  the  effective  use  of  a-priori  attitude  information  (Friedmann  [16]). 
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1.3  Scope  of  Investigation 

In  the  early  phase  of  this  research,  we  derived  a comprehensive  model  considering 
that  the  earth  is  an  ellipsoid  of  revolution  and  the  orbit  of  the  satellite  is  an  ellipse 
(Mikhail  and  Paderes  [32]).  All  three  components  of  the  deviation  of  the  satellite  posi- 
tion from  the  ideal  and  the  three  components  of  the  deviation  of  the  sensor  attitude 
from  the  nominal  are  incorporated  into  the  model. 

Using  this  model,  we  developed  a system  for  simulating  scanner  image  data  both  in 
the  direct  and  inverse  modes.  In  the  direct  mode,  given  the  parameters  defining  the 
orbit,  time,  satellite  position  deviation,  sensor  attitude  and  internal  sensor  geometry, 
and  given  the  ground  coordinate  of  points  of  interest,  the  corresponding  image  row  and 
column  numbers  are  derived.  In  the  inverse  mode,  the  ground  planimetrie  coordinates 
of  points  are  computed  given  the  corresponding  image  row  and  column  numbers,  the 
parameters  mentioned  above,  and  the  shape  of  the  terrain. 

This  model  has  been  extensively  tested  using  simulated  data  and  reported  on  in  last 
year’s  Symposium  (Mikhail  and  Paderes  [32]).  Five  different  sets  of  experiments  were 
performed  to  study  the  following  factors:  (1)  the  effect  of  error  in  parameter  estimates 
on  rectification  accuracy;  (2)  the  relative  performance  between  our  extensive  model, 
three  special  cases  with  simplifying  assumptions,  and  the  polynomial  model;  (3)  the 
effect  of  different  control  densities  on  rectification  accuracy;  (4)  the  effect  of  errors  in 


316 


derived  image  position  on  rectification  accuracy;  and  (5)  the  effect  of  errors  in  meas- 
ured ground  position  of  control  points  on  rectification  accuracy. 

In  Chapter  2 of  this  report,  additional  tests  of  this  model  using  two  frames  of  real 
data  and  the  corresponding  frames  of  simulated  data  employing  the  same  characterizing 
parameters  as  the  real  data  are  included.  Previous  conclusions  using  purely  synthetic 
data  were  generally  confirmed. 

With  the  comprehensive  model  fully  developed  and  tested  for  the  rectification  of 
single  images,  effort  was  directed  to  the  implementation  of  an  extensive  block  adjust- 
ment program.  It  is  based  on  the  same  mathematical  model  and  is  designed  to  accom- 
modate data  from  overlapping  satellite  scanner  imageries.  Block  adjustment  reduces 
the  required  amount  of  control  needed  to  meet  a specified  level  of  rectification  accu- 
racy. Synthetic  data  was  used  to  verify  the  algorithm  and  the  results  are  included  in 
this  report. 

From  the  experience  gained  by  analyzing  both  synthetic  as  well  as  real  data,  accept- 
able rectification  results  require  from  20  to  30  control  points.  Securing  this  number  of 
points  is  often  difficult  and  costly  because  well  identifiable  ” point”  features  are  not 
abundant.  Furthermore,  high  image  and  ground  positional  accuracy  for  control  points 
is  difficult  to  achieve.  Therefore,  the  research  effort  was  next  directed  toward  an  alter- 
native type  control.  In  Chapter  3 of  this  report,  the  novel  concept  of  ’’edge  point”  is 
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developed  and  tested  and  found  to  be  quite  promising.  The  idea  is  rather  simple  in 
that  a control  point  can  be  equivalently  thought  of  as  a pair  of  perpendicular  edges. 
Therefore,  one  edge,  which  may  be  considerably  easier  to  find  and  locate  accurately, 
can  be  used  as  control.  A point  on  an  edge,  which  we  shall  term  ’’edge  point”,  will 
have  2x2  covariance  matrix  which  is  almost  singular.  This  is  because  such  a point  pro- 
vides precise  information  only  in  the  direction  normal  to  the  edge. 

Having  generalized  somewhat  the  approach  to  control  by  introducing  the  edge 
points,  effort  is  then  directed  to  the  overall  problem  of  correspondence.  Chapter  4 
reviews  the  general  problem  of  correspondence  and  develops  an  algorithm  for  locating 
corresponding  objects  in  image  pairs.  The  algorithm  is  based  on  a robust  estimation 
procedure  for  the  parameters  of  an  affine  transformation.  It  has  been  tested  on  real 
image  data  with  simulated  distortions,  and  this  early  result  is  given. 
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2.  IMAGE  RECTIFICATION 


2.1  Theory 


The  comprehensive  model  we  derived  (Mikhail  and  Paderes  [32])  can  be  used  for 


simulating  data  both  in  the  direct  and  inverse  modes  and  for  rectification.  This  model 


has  the  following  form: 
(1) 


where: 


X 

"X^ 

y 

= X M 

Y-Ys 

z 

z-z8 

x,  y,  z are  the  coordinates  of  a given  point  in  the 

image  space.  These  coordinates  are  functions 
of  image  row  and  column  numbers  and  the 
internal  sensor  geometry; 

X,  Y,  Z are  the  corresponding  ground  - oordinates  of 
the  given  point; 

Xg.Yg  Zg  are  the  ground  coordinates  of  the  satellite 

position  when  the  pixel  containing  the  given  point 
is  sampled.  These  coordinates  are  the  sum  of  the 
ideal  or  predicted  satellite  position  and  the 
deviation  of  the  actual  satellite  position  from 
the  predicted  one.  The  ideal  position  is  a 
function  of  orbital  parameters  and  time  (t)  while 
the  deviations  are  functions  of  time  (t)  only; 

t is  time  which  is  a function  of  pixel  row  and  column 

numbers  and  the  internal  sensor  geometry; 
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^ M is  an  orthogonal  rotation  matrix  which  brings  the 

-y  ground  coordinate  system  into  the  sensor  coordinate 

y.  system.  This  is  a function  of  time,  sensor  attitude, 

deviation  of  the  satellite  position  from  the  ideal, 
orbital  parameters  and  earth  geometry; 

L. 

p,  X is  a proportional  constant  which  varies  from  pixel 

u to  pixel  (i.e.  a scale  factor). 

™ In  this  model,  small  deviations  of  the  satellite  position  from  the  ideal  (3  com- 

Li  ‘ 

ponents)  and  the  small  sensor  attitude  deviations  from  nominal  (3  components)  are 
modeled  as  third  degree  polynomial  functions  of  time.  Usually,  X,  which  is  a nuisance 
parameter,  is  eliminated  resulting  in: 

’ ' r — X mn  (X~Xg)  + m12  (Y-Ys)  + m13  (Z-Zsj  _ 

i.  1 z m31  (X-X,)  + m32  (Y-Y,>  + m33  (Z-Zt| 


(2) 


V . _ x _ m2i  X5)  m2o  (Y-Ys|  + m23  (Z  Zs)  _ 

L 2 z m31  (X-Xs)  + m32  (Y-Ys)  + m33  (Z-Zs) 

■'l 

These  two  equations,  which  are  now  in  a form  suitable  for  rectification,  are  then 
linearized  with  respect  to  four  groups  of  variables:  (1)  the  row  and  column  numbers  of 

V a given  point;  (2)  the  parameters  defining  time  and  satellite  orbit;  (3)  the  parameters 

p,,  defining  the  satellite  position  deviation  from  the  ideal  and  the  sensor  attitude;  and  (4) 

jj 

the  ground  coordinate  of  the  corresponding  point.  Other  variables  defining  the  internal 

P 

L->  geometry  of  the  sensor  and  the  geometry  of  the  earth’s  shape  are  held  constant.  Vari- 

p 

j j ■ ables  in  the  first  and  fourth  groups  vary  from  point  to  point,  while  variables  in  the 

lij 

m 

UJ 


second  and  third  groups  are  constant  throughout  a whole  frame.  The  linearized  equa- 


tion has  the  fcliowing  general  form: 

(3)  A v + B2A2  + B3A3  + BA  = f 

where: 


v is  a 2 element  vector  of  residuals  for  the  first  group 

of  variables  (i.e.  observed  row  and  column  numbers  for 
a given  point); 

A is  a 2x2  matrix  of  partial  derivatives  with  respect  to 
the  first  group  of  variables; 

A2  is  an  8 element  vector  of  corrections  to  the 

approximations  for  the  second  group  of  variables; 

B2  is  a 2x8  matrix  of  partial  derivatives  with  respect 
to  the  second  group  of  variables; 

A3  is  a 24  element  vector  of  corrections  to  the 

approximations  for  the  third  group  of  variables; 

B3  is  a 2x24  matrix  of  partial  derivatives  with  respect 
to  the  third  group  of  variables; 

ii  is  a 3 element  vector  of  corrections  to  the  approximations 
for  the  fourth  group  of  variables  (i.e.,  ground  coordinates); 

B is  a 2x3  matrix  of  partial  derivatives  with  respect  to 
the  fourth  group  of  variables; 

f is  a 2 element  vector  of  constants  resulting  from  the 
linearization. 

The  first  and  fourth  group  of  variables  in  the  linearization  are  known  because  they 

are  supplied  by  ground  control  points.  In  rectification,  the  values  of  the  unknown 


parameters  in  the  second  and  third  group  of  variables  are  recovered  in  an  adjustment 
procedure  using  control  points  and  the  linearized  model  shown  in  equation  (3). 

Because  of  weak  satellite  scanner  image  geometry,  not  all  the  unknown  parameters 
can  be  solved  for  simultaneously.  Instead,  unknown  parameters  in  the  second  group  of 
variables  are  first  recovered  under  the  assumption  that  all  parameters  in  the  third 
group  are  zero.  This  is  reasonable  since  the  model  is  designed  such  that  the  parameters 
in  the  third  group  are  as  close  to  zero  as  pussible.  Then,  using  the  same  set  of  control 
points  and  the  computed  values  of  the  parameters  in  the  second  group,  estimates  of  the 
parameters  in  the  third  group  are  derived.  Once  estimates  of  all  unknown  parameters 
are  available,  the  ground  coordinates  of  any  other  image  point  can  be  solved  for  with 
some  assumptions  regarding  the  shape  of  the  terrain. 

2.2  Experiments  With  Real  Single  Frame  Data 

Two  MSS  frames  taken  by  LANDSAT  2 are  used  in  this  experiment.  The  first 
frame  covers  Kansas  State  which  is  relatively  hilly.  It  has  153  uniformly  distributed 
control  points.  The  second  frame  principally  covers  the  state  of  Louisiana  which  is  fiat. 
About  1/3  of  this  frame  on  the  south-east  corner  is  over  the  sea.  It  has  192  well  distri- 
buted control  points,  although  not  as  uniformly  as  in  the  Kansas  frame. 


Ten  cases  were  run  for  each  frame  corresponding  to  two  types  of  model  {collinearity 


and  polynomial)  and  five  control  configurations.  For  each  case,  withheld  control  points 
were  used  as  check  points.  Table  1 shows  the  results.  The  collinearity  model  is  supe- 
rior to  the  polynomial  model  when  the  control  points  are  few  especially  in  hilly  terrain 
such  as  the  Kansas  frame.  Also,  increasing  the  number  of  control  points  beyond  25  has 
only  a marginal  effect  on  rectification  accuracy.  This  confirms  in  general  our  previous 
results  using  simulated  data  (Mikhail  and  Paderes  [32]).  Two  additional  cases  for  each 
frame  were  also  run  where  all  the  control  points  were  exercised  in  the  adjustment.  The 
RMS  of  the  residuals  on  control  points  for  the  Kansas  frame  were  58.8  and  57.8  m for 
the  collinearity  and  polynomial  models,  respectively.  The  corresponding  values  for  the 
Louisiana  frame  were  61.2  and  60.1  m.  These  values  are  the  upper  bounds  of  the  qual- 
ity of  the  data.  They  are  used  in  the  second  experiment  to  determine  the  precision  of 
the  image  measurements  input  into  the  simulation. 

2.3  Experiments  With  Single  Frame  Syntketic  Data 

Using  our  extensive  simulation  program,  the  characteristics  of  the  two  real  image 
frames  were  used  to  produce  simulated  images  which  reproduce  as  closely  as  possible 
the  real  images  with  respect  to  control  configuration  and  accuracy.  Simulation  was 
done  in  the  inverse  mode,  where  perfect  ground  coordinates  are  calculated  from  the 
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Table  1 RMS  Error  on  Check  Points  in  Meters  Using  Real  Data 


Numbe  r 
of 

Kansas 

Louisiana 

Control 

Points 

Collinearity 

Polynomial 

Collinearity 

Polynomial 

10* 

68.8 

117.1 

90.4 

96.6 

15* 

67.9 

73.6 

72.3 

71.7 

25 

67.6 

70.4 

69.3 

67.3 

40 

67.9 

69.5 

66.0 

65.4 

81/70** 

63.8 

65.5 

68.4 

68.4 

* When  the  number  of  control  points  is  low,  the  number  of 
parameters  in  the  model  is  reduced  to  avoid  convergence 
problems  * 

**  81  controL  points  for  Kansas  frame  and  70  for  Louisiana 
frame , 


given  image  coordinates  and  derived  rectification  parameters.  Then  the  calculated 
ground  position  of  control  points  for  both  frames  were  perturbed  using  normal  distribu- 
tion with  15  m standard  deviation  in  each  of  the  three  coordinates.  The  image  posi- 
tions were  perturbed  using  a combination  of  normal  and  uniform  distribution.  The  uni- 
form distribution  used  for  perturbing  both  frames  has  a range  of  -0.5  to  +0.5  pixel, 
and  is  used  to  account  for  round  off  errors.  The  normal  distribution  used  for  perturb- 
ing the  Kansas  frame  has  standard  deviations  of  0.44  pixel  in  row  and  0.40  pixel  in 
column  direction.  These  are  the  values  which  when  used  in  the  simulation  program 
produced  the  RMS  values  given  at  the  end  of  the  preceding  section  for  the  full-control 
case.  The  corresponding  standard  deviations  for  the  Louisiana  frame  were  0.40  pixel  in 
row  and  0,64  pixel  in  column  direction.  Several  sets  of  simulated  data  with  the 
described  perturbations  but  with  different  "seeds”  in  the  random  number  generator 
were  produced  and  rectified.  Table  2 shows  the  results  of  rectification  using  a represen- 
tative simulated  data  set.  Comparing  Tables  1 and  2,  it  can  be  seen  that  the  trends  in 
Table  1 which  resulted  from  rectification  of  real  data  are  duplicated  in  Table  2. 

Simulated  data  using  the  control  configuration  of  the  two  real  data  frames  but 
without  perturbations  were  produced  (i.e.  perfect  data  sets).  The  rectification  results 
using  this  perfect  data  set  are  shown  in  Table  3.  From  this  table,  two  significant 


results  can  be  seen.  First,  it  is  possible  to  recover  the  correct  set  of  exterior  orientation 


Table  2 RMS  Error  on  Check  Points  in  Meters  Using  Simulated 
Data 


Number 

of 

Control 

Points 

Kansas 

Louisiana 

Collinearity 

Polynomial 

Collinearity 

Polynomial 

10* 

84.0 

134.4 

80.9 

89.9 

15* 

76.9 

32. 0 

78.7 

79.6 

25 

75.4 

74.8 

72.5 

73.8 

40 

64.6 

64  .6 

65.0 

64.8 

81/70** 

61.9 

62.9 

60.5 

61.0 

* When  the  number  of  control  points  is  low,  the  number  of 
parameters  in  the  model yis  reduced  to  avoid  convergence 
problems . 

**  81  control  points  for  Kansas  frame  and  70  for  Louisiana 
frame . 


Table  3 RMS  Error  in  Check  Points  in  Meters  Using  Perfect  Data 


Number 

of 

Control 

Points 

Kansas 

Louisiana 

Collinearity 

Polynomial 

Collinearity 

Polynomial 

10* 

11.8 

102.5 

10.9 

15.4 

15* 

0.6 

13.2 

0.3 

11.2 

25 

0.5 

10.8 

0.3 

9.6 

40 

0.5 

10.4 

0.3 

9.6 

81/70** 

0.5 

9.9 

0.3 

9.8 

* When  the  number  of  control  points  is  low,  the  number  of 
parameters  in  the  model  is  reduced  to  avoid  convergence 
problems. 

**81  control  points  for  Kansas  frame  and  70  for  Louisiana 
frame . 


elements  using  the  collinearity  model  if  the  data  is  perfect.  Second,  and  more  impor- 
tantly, it  shows  that  the  systematic  error  inherent  in  the  polynomial  model  is  about  10 
meters. 


2,\  Theory  of  Block  Adjustment 

Given  overlapping  strips  of  scanner  imagery,  instead  of  performing  rectification 
frame  by  frame,  all  frames  can  be  rectified  simultaneously  in  one  block  adjustment. 
The  main  advantage  of  this  approach  is  that  conventional  points  and  edge  points  com- 
mon to  many  frames,  even  those  with  unknown  ground  coordinates,  can  be  exploited  to 
increase  rectification  accuracy.  These  points  are  known  as  tie  points.  Another  advan- 
tage in  using  this  method  is  that  mosaicking  of  large  areas  is  facilitated. 

We  implemented  a block  adjustment  procedure  for  satellite  scanner  imagery  utiliz- 
ing the  same  mathematical  model  used  for  simple  frame  rectification.  In  block  adjust- 
ment, each  point  appearing  in  any  frame  results  in  a pair  of  equations  similar  to  equa- 
tion (3).  This  is  the  linearized  form  of  the  mathematical  model  used  for  single  frame 
rectification. 

Using  the  method  of  least  squares  adjustment  (Mikhail  [34j),  the  resulting  system  of 


normal  equation  is  of  the  form: 
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(4) 

where: 


N 

N 

A 

t 

_Nt 

N_ 

A 

t 

N,  N,  N are  submat  rices  of  the  normal  equations 

coefficient  matrix; 

A is  a vector  of  corrections  to  the  approximations 

for  the  unknown  parameters  in  all  frames  (i.e., 

A2  and  A3); 

A is  a vector  of  corrections  to  the  approximations 

for  the  ground  coordinates  of  all  points; 

t and  t are  the  resulting  constant  vectors. 

As  an  example,  consider  the  block  of  overlapping  imagery  shown  in  Figure  1.  There 

are  5 image  strips  overlapping  by  approximately  60%.  Every  strip  has  4 frames  of 
imagery  and  every  frame  has  9 points  in  it.  The  frames  are  numbered  consecutively  in 
the  vertical  direction  along  the  direction  of  the  strips.  The  detailed  form  of  the  normal 
equations  coefficient  matrix  is  shown  in  Figure  2. 

The  contribution  to  the  normal  equations  of  the  coordinates  of  ground  points  (A) 
are  usually  eliminated  first,  resulting  in  a set  of  reduced  normal  equations,  which  has 
the  form: 


(5) 


NA  = t 
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where: 

N = N - NN~W 

t = t - NiV'1t 

The  reduced  normal  equations  can  be  formed  directly  without  having  to  form  the  total 
normal  equation.  Proper  numbering  of  frames  results  in  a banded  structure  for  the 
reduced  normal  equation  coefficient  matrix  N.  For  the  block  shown  in  Figure  1,  the 
detailed  structure  of  the  reduced  normal  coefficient  matrix  N is  shown  in  Figure  3. 
Each  off-diagonal  sub-block  in  Figure  3 is  due  to  points  common  to  a given  frame  pair. 
The  existence  of  these  subblocks  is  the  re'ison  why  block  adjustment  is  more  efficient 
than  single  frame  rectification.  As  a matter  of  fact,  block  adjustment  without  tie 
points  is  equivalent  to  multiple  single  frame  rectification.  Efficient  algorithms  exist  to 
solve  for  A in  equation  (5). 

2.5  Experiments  With  A Block  of  Overlapping  Synthetic  Image  Data 

A block  of  a total  of  9 frames,  composed  of  3 adjacent  strips  and  3 frames  per  strip 
were  simulated.  The  center  of  the  block  is  approximately  at  58.5°N  latitude.  The 
frames  have  about  60%  sidelap  between  strips  and  15%  overlap  along  each  strip. 
There  are  454  control  points  at  a grid  interval  of  20  km,  and  453  check  points  also  at  a 
grid  interval  of  20  km.  The  check  point  grid  is  displaced  by  10  km  in  b?  th  Easting  and 
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Northing  with  the  result  that  each  control  point  is  surrounded  by  4 check  points  and 
vice  versa.  The  ground  position  of  both  sets  of  points  were  perturbed  by  15  m stan- 
dard deviation  in  each  of  the  three  coordinate  directions  using  the  normal  distribution. 
The  image  position  of  both  sets  were  also  perturbed  using  a combination  of  uniform 
and  normal  distribution.  The  uniform  distribution  has  a range  of  -0.5  to  -4  0.5  pixel. 
The  normal  distribution  has  a standard  deviation  of  0.5  pixel  in  both  row  and  column 
direction.  Five  cases  of  block  adjustment  were  run  with  different  control  configuration. 
Table  4 shows  the  number  of  control  and  check  points  for  each  frame  and  for  the  whole 
block  for  each  of  the  5 cases.  It  also  shows  the  number  of  tie  points  in  the  block  for  all 
cases.  A tie  point  is  any  point  common  to  two  or  more  image  frames  which  has  known 
image  positions  but  unknown  ground  position  and  is  included  in  the  block  adjustment. 
In  this  experiment,  the  ground  elevation  of  tie  points  were  constrained  to  its  a-priori 
value.  This  is  necessary  because  it  was  previously  shown  that  elevations  cannot  be 
recovered  with  sufficient  accuracy  using  block  adjustment  techniques  for  aircraft 
scanner  data  (McGlone  and  Mikhail  [30])  and  aircraft  scanner  imagery  has  a much 
stronger  geometry  compared  to  satellite  scanner  imagery. 

Table  5 shows  a relative  comparison  of  RMS  errors  on  check  points  on  a frame  by 
frame  basis  between  block  adjustment  and  single  frame  rectification  for  all  five  cases. 
The  case  where  the  parameters  are  perfectly  known  is  included  as  a reference.  It 
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Table  4 Number  of  Control*  Tie,  and  Check  Points  flsed  in 
Block  Adjustment  Experiments 


N.  Cases 
Frames 

Number 

of  Control  Points 

Number 
of  ! 

Check 
Points 

1 

2 

3 

4 

5 

1 

11 

15 

27 

45 

91 

90 

2 

9 

13 

24 

39 

91 

89 

3 

9 

13 

26 

41 

88 

87 

4 

11 

15 

25 

45 

89 

86 

3 

11 

15 

26 

42 

89 

86 

6 

10 

14 

26 

41 

90 

86 

7 

10 

14 

25 

42 

88 

88 

8 

10 

15 

26 

44 

89 

87 

9 

12 

16 

26 

42 

85 

88 

Block  * 

42/  224 

1 

66/212 

125/180 

214/134 

454/0 

453 

* Control  points/tie  points. 


Table  5 A Comparison  of  Check  Point  RMS  Error  Between  Block 

Adjustment  (o^)  and  Single  Frame  Rectification  (agF). 


\.  Cases 

Frame  sN. 

The  Ratio  aD./o„„  in  Meters 
JiA  br 

Perfect 

Parameters 

i* 

2* 

3 

4 

5** 

1 

93/- 

79/92 

66/76 

67/70 

66/66 

65 

2 

77/- 

76/- 

68/79 

74/80 

69/69 

62 

3 

117/- 

100/- 

73/81 

80/79 

79/79 

68 

4 

87/- 

77/98 

65/73 

67/66 

65/65 

63 

5 

76/- 

74/142 

67/73 

70/72 

68/68 

64 

6 

79/- 

74/142 

63/69 

69/70 

63/63 

62 

7 

113/- 

70/85 

65/72 

65/68 

65/65 

59 

8 

92/- 

97/- 

64/81 

69/76  , 

68/68 

60 

9 

83/- 

72/82 

65/78 

67/69 

68/68 

62 

Ave . 

90.8/- 

79.9/- 

66.2/75.8 

69.8/72.2 

67.9/67.9 

62.8 

* Single  frame  rectification  did  not  converge  because  of  few  control 
points  (no  model  parameter  reduction  is  exercised  in  this  case). 

**  Block  adjustment  for  case  5 is  the  same  as  single  frame  rectifica- 
tion because  there  are  no  tie  points. 
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clearly  shows  that  tie  points,  which  are  much  more  readily  available  (and  less  expen- 
sive) than  control  points,  have  a beneficial  effect  on  rectification  accuracy  especially 
when  control  points  are  few.  This  improvement  in  accuracy  is  essentially  due  to  tie 
points  because  block  adjustment  without  tie  points  is  equivalent  to  single  frame 
rectification. 


<3.  uSSy  5? 


3.  EDGES  AS  CONTROL 


3.1  Edge  Points 

For  a typical  image  frame,  the  necessary  number  of  control  points  with  the  desired 
distribution  and  accuracy  is  difficult  and  sometimes  impossible  to  secure  because 
features  that  can  serve  as  control  points  are  few.  By  comparison,  edges  and  lines  occur 
more  often  and  in  combination  with  points,  the  necessary  amount  of  control  can  more 
easily  be  satisfied  if  a method  is  devised  that  can  utilize  lines  and  edges  as  control. 


A straight  edge  or  line  can  be  represented  by  a single  point  on  that  edge,  preferably 
near  the  middle,  and  a direction.  We  call  that  point  an  edge  point.  Edge  points  on  the 
ground,  or  maps  representing  the  ground,  can  be  identified  and  transferred  into  the 
corresponding  image  manually.  The  position  of  edge  points  on  the  image  can  then  be 
measured  in  a direction  perpendicular  to  the  edge  with  an  accuracy  comparable  to  con- 
ventional points  or  even  better.  The  covariance  matrix  for  the  position  of  the  edge 
point  in  the  (l,p)  coordinate  system  is 


(6) 


S---H  -phi 
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where: 

1 is  parallel  to  the  line 

p is  perpendicular  to  the  line 

(Tp  is  the  standard  deviation  of  edge  point  position 
perpendicular  to  the  edge 

<7]  is  the  standard  deviation  of  edge  point  position 
along  the  edge 

k is  equal  to  ajv p and  is  assigned  a very  large 
value 

The  direction  of  the  edge,  0,  can  also  be  measured  on  the  image.  The  covariance 

matrix  of  the  edge  point  in  the  (r,c)  coordinate  system  is 
(7)  £rc  = R.0  £]p  Rj1 

where: 

r is  the  row  direction  in  the  image 

c is  the  column  direction  in  the  image 

Rfl  is  the  rotation  matrix  with  argument  0 

Another  method  of  finding  the  edge  point  on  the  image  is  through  the  use  of  digital 

correlation.  First  a window  centered  on  the  edge  point  on  the  map  is  digitized  approxi- 
mately in  the  row-column  direction  of  the  image.  This  window  is  then  correlated  with 
the  image,  with  or  without  image  pre-processing  such  as  edge  detection,  resulting  in 
image  position  of  the  edge  point.  The  corresponding  position  covariance  matrix  can 
then  be  computed  using  (Forstner  [14j): 
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(8) 


where: 


£ {dgjdxf  £ (dg/3r)  (dg/dc) 
sym.  £ (<9g/dc)2 


trn  is  the  standard  deviation  of  image  noise 

g is  the  density  of  the  image  inside  a window 

containing  the  edge  point 

r,  e are  the  row  and  column  numbers 

dg/dr,  dgjdc  are  the  partial  derivatives  of  g with  respect 
to  r and  c. 

Before  the  location  of  edge  points  are  transferred  into  the  image,  their  locations  are 
first  defined  in  the  map  or  ground,  hence  edge  points  can  be  treated  as  ordinary  points 
as  far  as  their  ground  positions  are  concerned.  Once  their  image  postions  are  defined, 
edge  points  can  be  easily  incorporated  into  existing  rectification  programs. 

In  theory  a single  edge  point  is  enough  to  represent  a straight  edge  segment,  but  in 
practice  more  than  one  point  may  be  necessary,  especially  if  the  segment  is  not  really 
straight. 


3.2  Experiments  With  Edge  Points  as  Control 
In  Single  Frame  Rectification 

In  our  experiments  using  edges  as  control  for  rectification,  we  ran  ten  cases  with 
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different  edge  distributions.  Figure  4 shows  a schematic  representation  of  all  ten  cases. 
In  case  (1),  edge  point  pairs  have  the  same  coordinates  and  are  positioned  at  a regular 
grid.  The  angle  between  the  edges  in  an  edge  pair  is  fixed  at  90°.  Case  (2)  is  the  same 
as  case  (1),  except  that  the  acute  angle  between  the  edges  in  an  edge  pair  varies  ran- 
domly within  the  range  60°  to  90°.  Case  (3)  is  the  same  as  case  (2),  except  that  the 
range  for  this  case  is  from  30°  to  90°.  Case  (4)  is  the  same  as  the  previous  cases  except 
that  the  direction  of  edges  in  this  case  is  totally  arbitrary.  Cases  (5)  to  (8)  are  the 
same  as  in  cases  (1)  to  (4),  respectively,  except  that  the  position  of  one  edge  point  in  an 
edge  pair  is  randomly  perturbed  within  the  range  -100  to  + 100  pixel.  Case  (9)  is  the 
same  as  in  case  (1)  except  that  the  position  of  each  edge  pair  is  now  randomly  distri- 
buted over  the  whole  image  frame.  Case  (10)  is  the  most  general  case.  In  this  case 
both  the  position  of  the  edge  points  and  the  direction  of  edges  are  totally  arbitrary. 

The  amount  of  contamination  applied  to  all  ten  cases  to  simulate  random  errors  was 
the  same.  In  the  image,  the  ideal  coordinates  of  edge  points  were  perturbed  using  a 
mixture  of  uniform  and  normal  distribution  along  the  edge  direction  and  perpendicular 
to  it.  The  uniform  distribution  has  a range  of  -0.5  to  +0.5  pixel  in  both  directions 
representing  the  discretization  errors.  The  normal  distribution  has  a standard  deviation 
of  0.5  pixel  perpendicular  to  the  edge  and  25  pixels  along  the  edge  representing  the 
identification  errors.  The  ground  position  of  edge  points  were  perturbed  using  the  nor- 
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Figure  4 Distribution  of  Edges  for  Different  Cases  of  Rectification 
With  Edges  as  Control. 

NOTE:  (!)  Pairs  of  edge  points,  having  the  same  coordinates,  are 

positioned  at  regular  grid.  The  angle  between  the  edges 
is  90°. 

(2)  Same  as  (1),  except  that  the  angle  is  at  least  60°. 

(3)  Same  as  (1),  except  that  the  angle  is  at  least  30°. 

(4)  Same  as  CD,  except  that  the  angle  is  arbitrary. 

(5)-{8)  Same  as  (l)-(4)  respectively,  except  that  the 

coordinates  of  edge  points  randomly  deviate  from 
regular  grid  up  to  100  pixel. 

(9)  Same  as  (1),  except  that  the  position  of  a pair  of 
edge  points  is  totally  random. 

(10)  Both  the  position  of  an  edge  point  and  the  direction 
of  the  corresponding  edge  are  arbitrary. 
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mal  distribution  with  standard  deviation  of  15  meters  in  each  of  the  three  coordinate 
directions.  The  number  of  edge  pairs  for  all  cases  varied  from  25  to  145. 

Check  points  were  used  to  measure  the  accuracy  of  rectification.  There  were  144 
check  points  situated  on  a uniform  grid.  For  comparison  purposes,  the  same  check 
points  were  used  for  all  cases.  The  image  position  of  check  points  were  perturbed  in 
the  same  manner  as  edge  points,  except  that  the  perturbations  were  applied  in  the  row 
and  column  direction  instead  of  along  the  edge  and  perpendicular  to  it  and  that  the 
standard  deviation  for  the  normal  component  for  the  row  and  column  direction  were 
both  0.5  pixel.  The  ground  position  of  check  points  were  perturbed  in  exactly  the  same 
manner  as  those  for  edge  points. 

Each  case  in  Figure  4 is  replicated  ten  times  using  independent  perturbations,  A 
tabulation  of  the  average  rectification  accuracy  and  the  corresponding  standard  devia- 
tion are  shown  in  Table  6.  The  average  rectification  accuracy  for  all  cases  are  also 
shown  in  Figures  5 to  8.  In  these  figures,  the  abscissa  is  the  number  of  edge  pairs  and 
the  ordinate  is  the  average  rectification  accuracy  of  the  ten  replicates  in  meters.  Each 
curve  corresponds  to  the  case  number  as  annotated  in  the  figures. 

Figure  5 shows  the  results  from  cases  (1)  to  (4).  The  only  difference  between  these 
cases  is  the  angle  between  the  edges  in  an  edge  pair.  It  can  be  seen  from  the  figure 
that  decreasing  the  angle  between  edge  pairs  results  in  a corresponding  decrease  in 
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Table  6 Mean  and  Standard  Deviation  in  Meters  of  the  RMS  Errors  at  Check  Points  Using  Edge  Points  as 
Control 

(Each  Case  Consists  of  Ten  Replicates) 


Case  1 

Case  2 

Mean 

Std.  Dev. 

Me  an 

Std.  Dev. 

73.17 

1.13 

79.85 

2.22 

69.24 

0.67 

75.24 

2.38 

65.81 

0.74 

69.26 

1.22 

64.59 

0.71 

- 

- 

Case  3 


Mean  Std.  Dev. 


Case  4 

Mean 

Std,  Dev, 

105.66 

8.86 

85.76 

2.82 

75.22 

1.46 

69.29 

1.08 

Case  5 


Mean  Std.  Dev. 


69.92 

67.94 

65.63 


Case  6 

Case  7 

Case  8 

Case  9 

Case  10 

No,  of  Line  Pairs 

Mean 

Std,  Dev* 

Mean 

Std.  Dev. 

Mean 

Std,  Dev. 

Mean 

Std.  Dev. 

Mean 

Std.  Dev. 

25 

77.23 

2.32 

80.41 

2.39 

95.35 

4.57 

154.34 

27.06 

211.57 

29.27 

41 

72.33 

1.50 

76.69 

1.79 

81.52 

3.06 

72.71 

2.30 

94.14 

5.15 

81 

66.70 

1.27 

71.75 

1.35 

70.47 

1.21 

66.49 

1.41 

73.29 

1.56 

145 

- 

- 

— 

- 

67.61 

0.83 

64.30 

1.53 

68.83 

1.14 

NUMBER  OF  EDGE  PAIRS 

Figure  5 Plot  of  Rectification  Results  for  Cases  (1)  to  (4) 

NOTE:  Case  (1):  Pairs  of  edge  points,  having  the  same  coordinates*  are  positioned 

at  regular  grid.  The  angle  between  the  edges  is  90°. 

Case  (2):  Same  as  (1),  except  that  the  angle  is  at  least  60°. 

Case  (3):  y me  as  (1),  except  that  the  angle  is  at  least  30°. 

Case  (4):  Same  as  (1),  except  that  the  angle  is  arbitrary. 
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Figure  6 Plot  of  Rectification  Results  for  Cases  (5)  to  (8). 

NOTE:  Case  (5):  Edge  points  in  an  edge  pair  deviate  randomly  from  regular  grid  up  to 

100  pixel.  The  angle  between  edges  is  90°. 

Case  (6):  Same  as  (5),  except  that  the  angle  is  at  least  60°. 

Case  (7):  Same  as  (5),  except  that  the  angle  is  at  least  30°. 

Case  (8):  Same  as  (5),  except  that  the  angle  is  arbitrary. 
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Figure  7 Plot  of  Rectification  Results  for  Cases  (1),  (5),  and  (9). 

NOTE:  Case  (1):  Pairs  of  edge  points,  having  the  same  coordinates,  are  positioned  at 


Case  (5) 


Case  (9) 


regular  grid.  The  angle  between  edges  is  90°. 

Same  as  (1),  except  that  the  coordinates  of  edge  points  randomly 
deviate  from  regular  grid  up  to  100  pixel. 

Same  as  (1),  except  that  the  position  of  a pair  of  edge  points  is 
totally  random. 


3 c .1  n ir ;i 

■ a Hr j ti .-J  t:— v 


\ 1 

(4>  \ 1 

\ \ \ 

\ \ \ 

\ ' 

\ \ 

\ 


X 


I 


'1 

1 

1 

t f\ 

\i 


E 


NUMBER  OF  EDGE-PAIRS 

Figure  8 Plot  of  Rectification  Results  for  Cases  (4),  (8),  and  (10). 

NOTE:  Case  (4):  Pairs  of  edge  points  have  the  same  coordinates  and  are  positioned  at  a 

regular  grid.  The  angle  between  edges  is  arbitrary. 

Case  (8):  Same  as  (4),  except  that  the  coordinates  of  edge  points  randomly 

deviate  from  regular  grid  up  to  100  pixel. 

Case  (10):  The  position  of  edge  points  and  the  direction  of  edges  are  totally 

arbitrary. 
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rectification  accuracy.  This  result  is  essentially  repeated  in  Figure  6,  because  the  only 
difference  between  these  two  figures  is  that  the  distance  between  edge  points  in  an  edge 
pair  in  all  cases  shown  in  Figure  5 is  fixed  at  0 while  that  for  Figure  6 ranges  up  to  200 
pixels.  Comparing  cases  (1)  and  (4),  or  (5)  and  (8)  in  Figures  5 and  6 shows  , that  about 
2 times  more  edge  pairs  are  necessary  to  achieve  the  same  accuracy  as  with  conven- 
tional control  points  alone. 

The  effect  of  the  distance  between  edge  points  in  an  edge  pair  is  shown  in  Figures  7 
and  8.  Figure  7 shows  cases  (1),  (5),  and  (9)  where  the  angle  between  edges  in  an  edge 
pair  is  fixed  as  90°.  Figure  8 shows  cases  (4),  (8),  and  (10)  where  the  angle  is  totally 
arbitrary.  Separating  the  edges  in  edge  pairs  is  beneficial  up  to  a certain  point.  Total 
random  distribution  of  edges  over  the  whole  image  frame  is  inferior  to  other  distribu- 
tion when  control  edges  are  few. 

Figure  9 is  a comparison  between  cases  (9)  and  (10).  In  case  (9),  where  edge  points 
in  an  edge  pair  have  the  same  image  coordinates  and  the  pair  of  edges  intersect  at  90°, 
an  edge  pair  is  equivalent  to  a single  control  point.  Case  (10),  where  edges  have  totally 
arbitrary  direction  and  distribution  over  the  whole  image  frame,  is  the  most  extreme  of 
all  the  ten  cases  studied.  It  can  be  seen  from  the  figure  that  in  order  to  achieve 
rectification  accuracy  when  using  edges  comparable  to  that  achieved  when  using  con- 
ventional points,  the  number  of  edge  pairs  should  be  approximately  3 times  the  number 
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Figure  9 Plot  of  Rectification  Results  for  Cases  (9)  and  (10). 

NOTE:  Case  (9):  Pairs  of  edge  points,  having  the  same  coordinates,  are  randomly  distributed 

over  the  whole  image  frame.  The  angle  between  edges  is  90°. 

Case  (10):  The  position  of  edge  points  and  the  direction  of  edges  are  totally  arbitrary 
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of  control  points.  Thus  it  is  worthwhile  to  attempt  selecting  well  distributed  control 
features. 

Summarising  the  results  of  this  approach,  it  has  been  shown  that  edge  points  can 
efficiently  replace  conventional  control  points.  As  they  are  much  more  likely  to  be 
found  in  an  image  and  can  be  measured  with  at  least  the  same  precision  as  conven- 
tional points,  one  can  expect  that  the  overall  rectification  accuracy  may  even  be 
improved. 

For  a practical  implementation,  especially  to  decrease  the  requirements  on  the  skill 
of  the  operator,  one  should  provide  automatic  algorithms  for  finding  corresponding  con- 
trol features,  for  both  conventional  and  edge  points.  This  addresses  the  problem  of 
scene  matching.  The  next  section  is  devoted  to  this  problem  and  presents  an  algorithm, 
which  is  developed  first  for  finding  corresponding  tie  points  in  overlapping  image 
frames. 
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l ROBUST  ESTIMATION  FOR  CORRESPONDENCE 

4-1  A Hierarchical  Approack  to  Correspondence 

Scene  matching  is  a basic  requisite  for  different  tasks  which  use  the  geometric  pro- 
perties of  images,  such  as  terrain  classification,  the  derivation  of  digital  height  models 
or,  map  production.  It  is  also  the  first  step  in  applications  where  images  are  used  for 
determining  individual  points  in  three  dimensions  as  in  photogrammetric  triangulation. 
In  all  these  cases  either  one  image  is  related  to  another  image  (i,e.  registration)  or  to  a 
topographic  map,  (i.e.  rectification). 

Obviously  there  is  no  simple  way  to  accomplish  this  task  in  one  step.  One  rather 
has  to  pass  several  levels  in  a hierarchical  way,  where  the  results  of  one  are  the  approx- 
imations for  the  next  level.  This  is  similar  to  the  way  the  human  visual  system  is 
believed  to  behave  (Marr  [28]).  If  one  starts  with  a satellite  image  with  a relative  reso- 
lution of,  say,  1(T4,  i.e.  104  pixels  per  fine,  one  could  imagine  a 4 step  procedure,  where 
each  step  increases  the  precision  of  rectification  by  about  one  order  of  magnitude: 

I.  A global  image  match  which  defines  the  position  and  the  orientation  to  an  accu- 
racy of  2-109S,  i.e.  200-1000  pixels,  and  1-5°.  This  task  is  usually  done  by  an 
operator  but  may  use  the  very  efficient  algorithm  by  Lambird  et.al.  [25]  (see  also 
Stockmann,  et.al.  [40]). 
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2.  In  order  to  approximately  compensate  for  unknown  sensor  position  and  attitude, 
and  for  relief  displacements  due  to  undulations  of  the  terrain,  one  might  continue 
with  the  matching  of  image  patches.  The  number  of  these  patches  will  depend  on 
the  roughness  of  the  terrain  in  comparison  to  the  flying  height,  and  range  from  a 
few.  say  5 or  10  to  a hundred.  The  size  of  the  image  patches  will  be  chosen  in  a 
way  that  the  expected  displacement  will  be  less  than  half  the  linear  patch  size, 
thus  between  4 and  20^  of  the  side  of  the  image.  In  order  to  keep  the  amount  of 
data  in  a reasonable  range  one  will  use  a reduced  resolution,  say  between  1/2  and 
1/8,  leading  to  linear  patch  sizes  of  50-200  pixels.  The  algorithm  should  be  able 
to  compensate  for  at  least  linear,  i.e.  affine  distortions,  and  should  lead  to  accura- 
cies of  2-10  pixels,  referring  to  the  original  image.  Since  high  accuracy  is  not 
required,  one  might  effectively  use  methods  of  structural  pattern  recognition  to 
advantage  by  extracting  scene  features.  One  of  the  most  promising  algorithms 
for  this  step  is  the  one  by  (Barnard  and  Thompson  [4]). 

3.  Since  fine  correlation  using  differential  methods  requires  approximate  values 
which  are  within  1.5  pixels  of  the  final  match  (Forstner  [14]),  an  intermediate  step 
is  necessary.  Here,  all  correlation- based  methods  can  efficiently  be  used  as  the 
search  area  is  very  small.  The  window  size  will  range  beween  16  and  32  pixels 
(linear).  The  aim  in  this  step  is  to  choose  a fast,  robust,  and  reliable  algorithm. 
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Possible  candidates  for  this  task  are  sequential  algorithms  {Barnea  and  Silverman 


[5]),  the  phase  correlation  technique  (Kuglin  and  Hines  [24])  or  binary  correlation. 
But  of  course  the  algorithm  of  step  2 could  be  used  here  too. 

4.  Fine  correlation  in  the  last  step  may  yield  subpixel  accuracy,  if  the  texture  in  the 
image  allows  and  if  it  is  required  for  the  final  product.  Here,  differential  algo- 
riihms  (Cafforio  and  Rocca  [8],  cf.  Ftirstner  [14j)  are  most  efficient.  The  window 
size,  depending  on  the  texture,  will  range  between  8 and  32  pixels.  If  the  pixel 
size  is  adapted  to  the  (spatial)  spectrum  of  the  images,  accuracies  of  0.2  pixels  or 
better  can  be  reached  under  production  conditions  (Bernstein  [7],  McGillem  and 
Svedlow  [29),  Forstner  [15)). 

This  sequence  of  steps  has  of  course  to  take  the  special  boundary  conditions  of  the  pro- 
duction into  account,  and  may  be  varied  accordingly.  The  main  steps,  however,  will 
have  to  use  similar  algorithms.  The  concept  is  quite  different  from  those  used  for  the 
rectification  of  aerial  images  for  orthohoto  production,  e.g.  using  the  Gestalt  Photo 
Mapper  or  the  approach  by  Pan  ton  [35].  These  systems  do  not  have  to  cope  with  the 
weak  geometry  of  satellite  imagery,  thus  need  only  few  control  points.  They  can  there- 
fore use  the  internal  geometry  of  the  stereo  pair  for  recursively  updating  the  approxi- 
mations for  the  fine  correlation.  Unlike  these  procedures,  the  above  described  hierarchi- 
cal set  up  allows  a great  deal  of  parallelism  in  steps  2-4. 
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From  the  above  mentioned  algorithms  the  one  by  Barnard  and  Thompson  needs 
further  discussion.  Its  general  line  of  thought  can  also  be  found  in  the  approach  by 
Lambird  et.al.  [25]  and  in  the  procedure  by  Marr,  Poggio  and  Grimson  (Marr  [28]. 
Crimson  Jl7],  cf.  Kak  (20))  . 

With  respect  to  its  application  in  registration  and  rectification,  the  generality  of 
Barnard  and  Thompson’s  geometric  model  turns  out  to  be  a disadvantage  due  to  the 
resulting  high  numerical  effort.  Since  the  second  step  in  the  hierarchical  procedure  for 
registration  and  rectification  is  decisive  for  its  reliability,  this  step  has  to  be  designed 
such  that  the  actual  data  structure  is  taken  into  account,  and  it  must  also  be  flexible. 
Therefore  a new  algorithm  has  been  developed,  which  can  be  used  for  registration  and 
in  particular  for  selecting  tie  points  for  rectifying  overlapping  image  frames. 

We  will  first  formulate  the  problem  of  matching  image  patches  of  moderate  sizes  in 
subsection  4.2,  discuss  two  of  the  algorithms  and  sketch  the  new  one.  Subsection  4.3 
then  describes  the  concept  of  the  algorithm  in  detail.  Subsection  4.4  is  devoted  to  the 
actual  implementation  and  subsection  4.5  contains  an  example,  to  demonstrate  the  per- 
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4.2  The  Problem  of  Correspondence 


Let  two  images,  or  image  patches,  f and  f be  given, 
and  may  have  the  coordinates  z = (x*  ,y  )T  and  z"  — 
vectors,  where  T stands  for  transposed. 


Points  1 and  i"  in  images  i 
(x"  ,yf,)T,  thus,  z and  z"  are 


It  is  assumed  that  if  / and  i”  are  corresponding  points,  their  coordinates  can  be 
related  by 

(9)  z'  ~ t(z  ;p) 

where: 

t is  an  arbitrary  mapping  function;  it  might 
reflect  the  knowledge  about  the  geometric 
relation  between  the  images  I*  and  t' ; and 

p is  a vector  of  unknown  parameters  pt,  ...  , pu 


It  may  be  viewed  as  a severe  restriction,  that  the  mapping  function  must  have  an 
analytical  form.  But  one  should  keep  in  mind  that  also  a stochastic  and/or  segment- 
wise  continuous  function  can  be  brought  into  the  form  of  eq.  (9).  Eq.  (9)  will  cause  no 
difficulties,  particularly  in  small  scale  imagery. 

For  an  arbitrary  pair  of  points  (i,  i")  there  are  two  states  of  interest: 

A.  i and  i”  are  corresponding  points 

B.  i and  i”  are  not  corresponding  points 
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The  problem  of  correspondence  now  simply  consists  of:  1)  finding  the  corresponding 
points;  and  2)  determining  the  parameter  vector,  p,  of  the  mapping  function.  Theoreti- 
cally the  solutions  to  1)  and  2)  are  equivalent,  as  1)  implies  2)  if  applied  to  all  pixels, 
say  in  t . But  this  is  neither  feasible  nor  necessary,  as  the  mapping  function  can  rea- 
sonably be  assumed  to  be  smooth,  i.e.  roughly  speaking  bandlimited,  and  only  a small 
number  of  corresponding  points  is  sufficient  to  describe  the  mapping  function.  Whereas 
these  pairs  of  corresponding  points  might  replace  the  parameters  p,  the  mapping  func- 
tion is  necessary,  if  interpolation  is  required. 

The  known  approaches  actually  use  only  a very  limited  number  of  points  and  expli- 
citly or  implicitly  a mapping  function  of  the  type  in  eq.  (9).  In  order  to  reduce  the 
numerical  effort  and  at  the  same  time  increase  the  reliability,  objects  o and  o"  are 
used  in  both  images  with  feature  vectors  f^  and  ('  in  addition  to  the  coordinates  z and 
z attached  to  it: 

(10)  o*  =o’(z,;f ')  and  o”  = o"(z";f") 

The  procedures  typically  consist  of  three  steps: 

a.  selection  of  appropriate  objects  o'  and  o"; 

b.  determining  the  similarity  between  all  objects  o'  in  image  I*  and  all  objects  o"  in 
image  f',  yielding  possible  candidates  for  corresponding  objects 
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c.  using  some  context  information  to  find  the  pairs  (o'  ,o”)  of  corresponding  objects. 

4.2.1  The  LNK-Method  (Lambird  et.al.  [25],  Stockmann  et.al.  / 40 j) 

a.  Using  edge  detection  procedures,  this  method  selects  objects  which  are  either 
points  or  point  pairs.  Points  belong  to  4 classes.  Pairs  of  such  points  are  called 
abstract  edges,  abstract  because  the  connecting  line  need  not  be  a real  edge  in  the 
image.  For  simplicity,  we  restrict  the  discussion  to  the  point  objects.  Thus,  an 
object  o’ , say,  in  image  f is  represented  by  its  coordinates  z and  its  class  ( — w : 
o = o (z  ). 

b.  Among  all  possible  points  (o'  ,o  /)  of  objects,  only  those  which  belong  to  the  same 
classes  are  selected  as  possible  candidates.  Thus,  if  J = <J'  the  objects 
o'  and  o"  are  said  to  be  similar. 

c.  The  aim  of  the  procedure  is  to  determine  the  unknown  parameters  of  the 
geometric  transformation,  which  in  this  case  consist  of  the  two  shifts  in  x-  and  y- 
directions.  Each  pair  of  similar  objects  leads  to  an  equation  t(zf  ,p)  = z-p  which 
can  be  solved  for  p.  The  estimate  p for  the  true  shift  p is  taken  from  the  histo- 
gram of  all  p = z ~z ' by  searching  for  the  peak-value  representing  the  most 
probable  shift.  At  the  same  time  one  obtains  a classification  of  the  object  pairs 
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into  the  two  classes:  wA  of  corresponding  points  and  of  non-corresponding 
points. 

The  approach  is  a direct  solution,  where  no  iterations  are  necessary.  A further  advan- 
tage is  the  sharp  peak  in  the  histogram,  which  guarantees  a reliable  solution  even  if  the 
numbers  of  objects  is  large,  i.e.  even  when  the  background  noise  is  considerable.  On 
the  other  hand,  the  method  requires  that  eq.  (9)  in  an  extended  form  ~ 

t[z  ,f ;p)  is  solvable  for  p.  Thus,  if  more  than  two  parameters  have  to  be  estimated,  the 
object  has  to  contain  additional  geometric  features  such  as  length  and  orientation,  in  ^ 
and  , thus,  requiring  more  complex  objects,  such  as  lines,  triangles,  etc.,  to  be 
extracted  from  the  image.  This  might  not  only  increase  the  number  of  combinations 
but  also  requires  an  additional  dimension  of  the  histogram  for  each  additional  unknown 
parameter.  Nevertheless,  a primary  advantage  is  the  absence  of  requirements  for 
approximate  values.  Thus,  with  say  4 parameters,  the  images  might  have  any  relative 
orientation  and  scale.  This  method  is  therefore  highly  recommended  for  step  I in  the 
hierarchical  scene  matching  procedure. 

4-2.2  The  Barnard"  Thompson  Algorithm  (4j 

a.  This  algorithm  starts  from  objects  which  are  represented  by  the  gray  level  matrix 
g\  say,  centered  at  distinct  points:  d (z  &).  The  selection  uses  the  interest 
operator  by  Moravec,  namely  the  minimum  variance  of  the  gray  level  differences 


I 

L 


in  the  four  main  directions.  This  guarantees  that  no  points  on  edges  are  selected, 


which  are  not  discernible  from  neighbouring  points  on  the  same  edge. 


b.  The  similarity  measure  uses  the  coordinates  and  the  gray  level  differences  under 
consideration,  deriving  an  initial  probability  that  two  objects  o'{z  ,g)  and 
o"(z",g")  correspond,  i.e.  (o* ,o”)  belongs  to  the  class  wAof  corresponding  points: 


P^o'jo")  i u;A)  = f(z,-z,,,g,-g,<)  cs:  l/|g'-g'|  2 if  the  shift  | z'-z''  | is  less  than  a 


threshold  and  P((o;  ,o”)  e cja)  = o elsewhere. 


c.  The  model  of  the  geometric  transformation  is  a differential  one.  They  assume 
that  the  scene  is  regionwise  smooth:  z”-z’  = t(z’ ) with  the  derivative  dtfdz 
being  bounded,  except  for  the  borders  of  the  regions.  The  bound  for  At/Az’ 
(being  1 pixel  for  Az  <15  pixels)  is  used  to  update  the  initial  probabilities  using 
a relaxation  scheme  (Rosenfeld  et.al.  [38]). 


The  model  is  extremely  flexible,  due  to  the  randomness  of  the  derivative  within  the 
admissible  bound.  The  method  can  further  be  generalized  by  using  more  complex 
objects,  e.g.  the  abstract  edges  of  the  LNK-method  and  thus  can  be  an  excellent  solu- 
tion for  step  2 in  the  hierarchy.  The  numerical  effort  and  the  quality  of  the  result, 
however,  are  highly  scene  dependent.  In  particular,  the  number  and  distribution  of  the 
selected  objects  are  critical  for  the  reliability  of  the  result.  Also  the  complexity  of  the 
geometrical  model  might  not  be  necessary  for  satellite  or  aerial  imagery  of  moderate 
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scale  (say,  <1:20,000),  thus  questioning  whether  the  numerical  effort  resulting  from  the 
relaxation  process,  cannot  be  reduced,  if  one  takes  the  simpler  geometry  of  ”far-range” 
imagery  into  account. 

Though  both  procedures  follow  the  same  general  concept,  their  techniques  are  essen- 
tially different.  The  simple  geometric  model  on  which  the  LNK-Method  is  based, 
allows  a fully  consistent  line  of  thought.  This  makes  a statistical  evaluation  of  the 
result  feasible,  e.g.  using  the  broadness  of  the  peak  in  the  histogram.  On  the  other 
hand,  though  the  procedure  of  Barnard  and  Thompson  is  excellently  motivated,  it  is 
heuristic.  This  prevents  a thorough  evaluation  of  its  results. 

4-2.3  Outline  of  the  New  Procedure 

The  new  solution  for  the  correspondence  problem  essentially  aims  at  a maximum- 
likelihood  estimation  of  the  unknown  parameters  p of  the  geometrical  transformation. 
It  follows  the  same  three  steps  of  the  procedures  described  above.  An  attempt  has  been 
made  to  derive  the  three  steps  on  a common  theoretical  basis,  and  at  the  same  time 
make  it  amenable  to  generalizations  for  rectification: 

a.  The  same  objects  are  used  as  in  the  Barnard-Thompson  algorithms,  namely  points 
with  their  gray  level  matrix.  The  selection  is  guided  by  the  theoretical  precision 
expected  from  cross-correlation.  It  turns  out,  that  this  selection  principle  is 
closely  related  to  Moravec’s  interest  operator. 


b.  The  similarity  of  pairs  of  objects  is  also  based  on  the  theoretical  precision.  In 
addition  to  the  gray  level  difference  between  the  two  objects,  the  texture  is  taken 
into  account,  namely  the  variance  of  the  gradient.  Moreover,  the  formulation 
allows  the  introduction  of  correlation  measures  from  any  feature  vectors,  possibly 
including  structural  features.  Thus,  very  general  similarity  measures  can  be  used 
without  losing  the  relation  to  the  geometrical  model. 

c.  The  maximum  likelihood  estimation  for  the  parameters  p of  the  mapping  function 
requires  the  knowledge  of  the  probability  density  function  of  the  observations. 
Observations  in  this  case  are  the  coordinate  differences  Az  from  the  modified  form 
Az  - z ~z  = t(z’ ; p)  of  eq.  (9).  The  coordinate  differences  of  corresponding 
points  can  reasonably  be  assumed  to  be  normally  distributed,  whereas  the  coordi- 
nate differences  of  non- corresponding  points  are  approximately  equally  distributed 
between  -d  and  +d  where  d is  the  dimension  of  the  image  patch.  These  observa- 
tions therefore  can  be  interpreted  as  outliers  or  blunders  with  respect  to  the 
model  eq.  (9).  As  the  redundancy  of  the  system  is  rather  high,  robust  estimation 
procedures  should  work  efficiently  in  this  case.  The  high  percentage  of  outliers, 
i.e.  non-correspondence  is  compensated  by  the  non-similarity  of  the  objects,  which 
lead  to  a low  initial  weight  of  these  observations. 
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4.8  Mathematical  Model 

This  section  provides  the  mathematical  model  for  the  correspondence  algorithm. 
We  will  start  with  the  mapping  functions  and  the  robust  estimation  procedure  for  the 
determination  of  the  unknown  parameters.  The  similarity  measure  then  leads  us  to  the 
interest  operator  used  for  the  point  selection. 

4-3.1  Mapping  Functions 

The  relation  between  two  image  segments  of  a satellite  or  aerial  image  can  be 
approximated  by  a low  degree  polynomial: 

Shift  only 

(11  a, b)  z”  = a + z or  Az  — a 

(Stochastical  variables  are  underscored.) 

Affine  transformation 

(12  a,b)  2."  = a + B 4 or  A&  = a + B z 

Second  order  polynomial 

(13  a,b)  z’  = a + B 1 + C z ® z or  Az.  = a + Bz'  + C z ® z 


% *5^ 
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a =s  ; B = 

P2  Ps  Pb 


P3  P4 


4 

,J!  c = L, 


P7  Ps  Ps  Po 


P10  Pll  P11  P12 


ft  I l^p  / / / ft  t t t t \ 

(z  © z )A  — (x  x x y y x y y ) 


B = B - I. 


By  introducing  conditions  on  the  parameters  pi;  one  may  restrict  the  mapping  func- 


tions to  conformal  ones.  For  example,  the  conditions  p3  = pQ  and  p4  = -p5  in  eq.  (12) 


lead  to  a similarity  transformation  with  shifts,  scale,  and  rotation  only.  The  transfor- 


mation parameters  only  occur  linearly  in  the  mapping  function,  thus  could  be  solved  in 


one  step  using  the  least  squares  technique. 


4-3.  S Robust  Estimation 


The  least  squares  technique  starts  from  the  linear  (or  linearized)  model 


EQ)  = A x = E aj1  x ; D(l)  = C„  = cr0z  Qu 
i=l 


where  the  nxl  vector  1 contains  the  observations,  in  our  case  the  coordinate  differences 


Az,  with  their  covariance  matrix  Cjj.  It  is  usually  split  into  the  unknown  variance  fac- 


tor <t2  and  the  known  coefficient  matrix  Qh.  The  nxu  design  matrix  A,  having  rows  a„ 


is  supposed  to  be  known,  x are  the  unknown  parameters. 


If  the  observations  can  be  assumed  to  be  uncorrelated,  then  one  uses  the  weights  wj 
or  the  weight  matrix  W = diag  (wj)  = diag  (i/qn)  to  advantage,  to  solve  the  minimum 
problem 

(15)  S (a/  x.  - ii)2  wj  = £ vj2  wj  min. 

It  is  known  that  the  estimated  parameters  x are  sensitive  to  errors  in  the  model  eq.  (9), 
especially  gross  errors  or  outliers,  in  the  observations.  This  is  due  to  the  fact  that  the 
solution  to  eq.  (15)  is  also  the  maximum  likelihood  estimator  for  x,  if  the  observations 
are  normally  distributed.  Observations  with  outliers,  however,  can  be  viewed  to  belong 
to  longer  tailed  distributions.  Examples  are  the  Laplace-Distribution  f(x)  = c el  * I and 
the  Cauchy-Distribution  f(x)  = c/(l+x2). 

In  order  to  eliminate  the  effect  of  outliers  on  the  result  one  can  use  maximum- 
likelihood  type  estimators.  Then,  instead  of  the  sum  of  the  squares  of  the  residuals  vs 

the  sum  of  a less  increasing  function  /ofvj)  is  minimized  (Huber  [19]): 

(16)  £ p(a5T  i ~!i)  = £ p(vj)  -*■  min 

i l 

Discussion: 


1.  Choosing  p(v)  = v2/2  gives  the  least  squares  estimator 


2. 


Choosing  p(v)  = — |v|  p yields  the  estimator  minimizing  the  Lp-norm.  A special 
P 

case  is  obtained  for  p = 1:  Minimizing  />(v)  = I v | is  the  well  known  least  sum 
method,  being  the  ML-estimate  for  the  Laplace-Distribution.  It  is  the  mul- 
tiparameter version  of  the  median.  Barnea  and  Silverman  (5)  used  it  for  cross 
correlation. 

3.  The  choice  of  p can  be  guided  by  the  evaluation  of  the  ” Influence-Curve”  IC(v) 
(Hampel  [18])  being  proportional  to  the  derivative  ^£»(v)  = dp/dv  of  the  minimum 
function.  IC(v)  or  V(v)  give  an  indication  of  how  strong  is  the  influence  of  an 
outlier  on  the  estimate  £• 

4.  The  solution  of  eq.  (16)  can  use  existing  programs  for  least  squares  solution,  by 
either  modifying  the  residuals,  v*  = V /j(v)  or  by  modifying  the  weights: 

/5(V;)  Vj2  V;2 

( 17)  J p[v  ) = £ — - — — = E w(vi)  — — > min. 

V|2/2  2 v 2 


using  the  weight  function 


(18) 


w(Vi)  = 


/>(Vj) 

v2/2H-c 


(c  « 1) 


In  an  iterative  solution  the  weights  of  all  observations  are  updated  depending  on 
their  residuals  from  the  previous  iteration: 


(19) 


w>+1)  = w.(o)  W(V.M) 
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5.  If  the  function  p(v)  is  convex,  thus  ^(v)  non-decreasing,  and  the  mode!  is  linear, 


then  convergence  is  guaranteed  under  broad  conditions. 


Minimizing  the  Lt-norm  thus  seems  to  be  optimal,  as  it  is  robust,  and  convergence  is 


guaranteed.  This  method  however  has  two  disadvantages: 


1.  p(v)  has  no  derivative  at  0,  thus,  the  influence  curve  is  not  continuous,  which  does 


not  guarantee  a unique  solution. 


2.  The  influence  curve  0(v)  = sign(v)  is  not  zero  for  large  values,  thus  large  outliers 


have  still  an  influence  onto  the  result,  which  is  not  desirable. 


We  therefore  propose  to  use  the  following  weight  functions. 


1.  In  order  to  ascertain  convergence  we  slightly  modify  the  minimum  function  of  the 


Lj-norm  (cf.  Figure  10). 

(20  a)  />i(v)  =2{\/l  + v2!/2-l) 


(20  b) 


, * _ 4 (\/l+v2/2  - 1) 
wi(V)  = -J 2 


(20  c) 


^i(v)  = 


\/ 1 +v2/2 


p^v)  is  strictly  convex  with  decreasing  curvature  for  large  v. 


NOTE:  0:  least  squares , non  robust  (iJj(v)  not  bou 

1:  L^-Norm,  robust,  convergence  guaranteed 

2:  redescending  IC, 

a:  ML-est imator  for  Cauchy-Distribut i 

b : exponential  weight-function  (Kraru 


2.  After  having  reached  convergence,  one  can  assume  to  have  good  approximate 


values  for  the  parameters.  In  order  to  eliminate  the  influence  >f  large  outliers  one 
could  take  one  of  the  following  two  minimum  functions: 

/?2a  leads  to  maximum- likelihood  estimators  starting  from  a Cauchy-Distribution 
(21a)  P2a(v)  ~ ln(l+V2/2) 

(21  »>  Wj,(v)  = Usli±fZ21 

TT™ 


(21  c) 


V>2  a(v)  = 


V 

(1  +v2/2) 


No  convergence  is  guaranteed  in  th_-  general  case.  Also,  as  ip  is  descending  for 
large  v,  no  unique  solution  is  guaranteed  if  arbitrary  appro;:  ate  values  are 

allowed.  This  is  meaningful  as  the  Cauchy-Distribution  has  l :.aer  mean  nor 
variance. 


The  following  minimum  function  is  proposed  by  Krarup  et.al.  [22]  which  consid- 
erably reduces  the  weights  of  false  observations  due  to  its  exponential  form: 


This  weight  function  fulfills  practically  all  requirements  for  a well  behaved  weight 


function.  (Hampel  [18],  Werner  [43]).  The  functions  are  shown  in  Figure  10 
together  with  the  minimum  weight  and  influence  function  of  the  least  squares, 
P0(v)  = v2/2- 

4-3.3  Similarity  Measure 

The  estimation  procedure  requires  initial  weights  for  the  observations  which  in  our 
case  are  the  coordinate  differences  Az  of  object  or  point  pairs,  which  need  not 
correspond.  Hence,  the  majority  of  the  observations  are  outliers  and  assuming  equal 
weight  would  prevent  the  solution  from  getting  started. 


-’T, ' - 
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Reasonable  weights  can  be  obtained  from  the  covariance  matrix  of  the  estimated 
shifts  Az,  if  we  would  apply  cross  correlation  to  all  pairs  of  points.  It  is  given  by 


(Forstner  [14]) 

(23)  V(z'-z')  =Cov 


ft  | 

X - X 

tt  t 

y -y 


*2 


^SxSy 

SSygx  Eg£ 


-“1 


” Q 


where: 

g is  the  gray  level  function  of  the  object,  restored  from  % and  g * 

<x|g  the  estimated  variance  of  the  gray  level  differences,  and 
gx,gy  are  the  gradients  of  g in  x-  and  y-directions  respectively. 

The  covariance  matrix  fully  describes  the  precision  of  the  match  between  the  gray  level 
function  g;  and  % of  the  two  objects  o'  and  o" , This  precision  depends  on: 

1.  The  number  of  pixels  used. 

2.  The  noise  variance. 

3.  The  texture  of  the  object,  namely  the  edge  business.  It  can  be  shown  that  this 
measure  is  directly  related  to  the  bandwidth  of  the  signal  and  the  curvature  of 
the  cross  correlation  function  (Forstner  [14]). 


: v 
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The  covariance  matrix  can  be  visualized  by  an  error  ellipse  (cf.  Mikhail  [34]),  giving  the 
precision  of  the  match  for  all  directions.  A good  match  therefore  must  fulfill  the  fol- 


lowing two  requirements: 


Cl:  The  error  ellipse  should  be  close  to  a circle,  otherwise  the 


match  is  not  well  determined  in  one  direction,  e.  g.  at  an  edge. 


C2:  The  error  ellipse  should  be  small. 


Both  criteria  will  be  used  for  the  measure  of  similarity  between  two  objects  and  the 
selection  of  interesting  points. 

If  the  ellipse  is  close  to  a circle  the  weight  can  be  directly  derived  from  the  trace  of 
the  covariance  matrix 

<24)  w = 1 1 


tr(v)  dig  tr(Q) 


Observe,  that  the  trace  is  invariant  to  rotations.  Taking  the  gray  level  differences 
directly  to  estimate  g has  the  disadvantage  of  being  biased  if  the  two  images  have 
different  brightness  and  contrast.  The  correlation  coefficient  is  known  to  be  a better 
measure.  Now,  if  one  for  simplicity  assumes  the  images  g’  and  g to  be  related  to  the 
true  image  g by  g = a!  (g+n)  + b'  and  g"  = &'  (g-fn”)  + b",  with  - crj* 

where  a and  b represent  contrast  and  brightness,  the  signal  to  noise  ratio 
SNR2  = or 2/o-S  is  functionally  related  to  the  correlation  coefficient  by: 
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(25  a) 


P ~ 


- _5ulL  _ °g  _ SNR2 


°SCrB"  n SNR2+1 


or 


By  using  the  approximations 
(26) 


ai  - °b  <V 


(27) 


tr  Q = \J  tr  Q'  tr  Q' 


(25  b) 


SNR2  = ■%=-£- 


al  1 ~P 


'.:fc 


and 

(28) 


°lS  ~ 2 


we  obtain  the  following  relation  for  the  weight  of  the  observation  Az: 

L—L 


(29) 


w(o  ,0  ) Hi 


2 W * >/trQ'  trQ' 


Discussion: 


1.  The  weight  depends  on  two  terms.  The  first  term  reflects  the  similarity 
between  the  two  objects  and  needs  to  be  calculated  for  all  object  pairs.  The 
second  term  depends  on  values  obtained  separately  from  both  images. 
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2.  The  traces  tr  Qf  and  tr  Q"  measure  the  distinctness  or  the  locatability  of  the 

* 

objects  and  are  critical  for  the  selection  of  appropriate  points.  The  reason  is, 
the  noise  level  can  be  realistically  assumed  to  be  constant  in  both  images. 

3.  The  weight  is  a generalization  of  the  one  used  by  Barnard  and  Thompson.  It 
differs  in  two  ways.  First,  it  is  independent  of  brightness  and  contrast,  as  we 
are  only  interested  in  the  weight  ratios.  Second,  it  takes  the  texture  of  the 
object  into  account. 

4.  A simple  and  reasonable  criterion  to  reject  object  pairs  based  on  the  correla- 
tion coefficient  is  p > — . This  is  equivalent  to  requiring  the  SNR  to  be  larger 
than  1. 

5.  The  main  advantage  of  the  separation  of  the  different  terms  in  eq.  (29)  lies  in 
its  ability  to  include  other  measures  for  similarity.  The  correlation  coefficient 
need  not  be  derived  from  the  gray  levels  but  may  use  other  features  { and  V 
of  the  objects,  e.g.: 

a.  One  could  use  rotation  and  scale  invarant  features,  as  the  moments  pro- 


posed by  Wong  and  Hall  [44]. 
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b.  One  could  use  a small  set  of  features  just  to  decrease  the  computation 
time,  e.g.  the  low  frequency  terms  of  a cosine  transform. 

c.  One  eould  use  structural  information,  the  result  of  a classification  or  a 
linguistic  description  in  combination  with  statistical  measures.  The  only 
requirement  for  the  measure  is  to  have  the  properties  of  a correlation 
coefficient. 

The  separation  of  the  correlation  coefficient  from  the  variance  and  the  texture  of 
the  gray  level  function,  allows  one  to  generalize  the  weight  determination  without 
losing  the  information  about  the  geometric  distinctness  of  the  object. 

4-3.4  Interest  Operator 

We  have  assumed  that  the  error  ellipse  representing  the  covariance  matrix  of  the 
coordinate  difference  is  close  to  a circle.  Moreover,  we  require  that  the  point  can  be 
well  located.  Measures  of  both  requirements  should,  in  a simple  way,  be  derivable  from 
the  gray  level  function  of  the  image  patch,  as  they  have  to  be  determined  for  all  pixels. 
They  should  also  be  invariant  to  rotation;  a scale  factor  will  not  change  too  much  the 


. li 


jr 


. i, 
k.  u 


\ 


” T 


■■  K- 

O u 


i 


ranking  of  the  different  pixels. 


As  the  eigenvalues  of  the  covariance  matrix  are  invariant  to  rotations,  and  the  trace 
equals  the  sum  of  the  eigenvalues,  we  will  use  them  also  for  determining  the  closeness 
of  the  error  ellipse  to  a circle.  Moreover,  the  eigenvalues  of  the  coefficient  matrix,  say 
Q',  and  those  of  its  inverse  N*  = (Q')_I  are  related  by  X^Q')  = l/XjtN').  Thus,  let  Xj 
and  X2  be  the  eigenvalues  of  N* , then  the  ratio 


(30) 


4 det  _ 4 XtX2  _ , ^2  .2 

(tr  N')2  ™ (Xj  -t-  X2)2  Xj  d*  X2 


is  an  adequate  measure  for  the  closeness  of  the  error  ellipse  to  a circle.  If  q = 0 (and 
not  both  X j and  X2  are  zero),  then  det  is  zero  and  the  matrix  is  singular.  This 
means  that  gx  and  gy  are  linearly  dependent  thus  the  point  may  lie  on  an  edge.  The 
case  g = 1 is  reached,  only  if  the  eigenvalues  are  equal  (X1-X2  = 0)  thus  representing  a 
circular  error  ellipse,  The  calculation  of  q need  not  use  the  eigenvalues,  but  rather  the 
determinant  and  the  trace  of  N* : 

(31  a)  det  N7  =E(g;)2.S(g;)2-(Eg;g;)2 


(31  b)  trtf  = E(gx)2  + E(g;)2 

The  sums  can  be  readily  derived  from  the  squared  and  multiplied  gradient  images  by 


convolution. 
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(32) 


Similarly,  one  can  derive  an  expression  for  tr  Q 

tr  N* 


tr  Q 


det  N* 


Thus  the  selection  of  interesting  points  can  be  accomplished  for  both  images  separately 
in  the  following  steps: 


1.  Determination  of  Eg“,  Egxgy,  and  £gy; 

..  ( 

• 

2.  Determination  of  tr  Q and  q using  eq.  (30)  - (32); 

3.  Determination  of  the  interest  value,  being  a preliminary  weight, 

— T7  = f°r  <3  ^ threshold 

tr  Q tr  N M _ 

0 otherwise  V 

for  each  pixel;  1 

r.-*x i 

4.  Suppressing  all  non-maxima  in  the  function  w(i,j);  i 

5.  All  values  w(i,j)  give  rise  to  an  object  o. 

» r> 

' f 

= i ti 

tv  p 

i 

■ i 


(33) 
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= < 
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4-4  Algorithmic  Solution 
4.4.I  The  Selection  of  Objects  of  Interest 

The  interest  oprator  eq.  (30)  to  (33)  requires  the  variance  and  covariance  of  the  gra- 
dient image  at  each  pixel.  The  used  window  size  should  be  adaptable  to  the  texture  of 
the  image  patch.  If  one  uses  a square  (in  general,  a rectangle)  window  the  number  of 
operations  per  pixel  needed  for  the  interest  operator  can  be  made  independent  of  the 
window  size.  This  is  due  to  the  fact,  that  the  array  I(£gx),  say,  containing  the  sums 
Eg“  can  be  derived  from  g|  by  convolution  with  a separable  window  of  size  n3j  x ns2, 
W(i,j)  = I with  w = ex  ej  and  ejT  = (11. ..1)  containing  nsi  elements  1.  As  the  convolu- 
tion with  e,  or  eT,  needs  only  2 additions,  if  done  recursively,  only  4 additions  per  pixel 
are  necessary  for  the  determination  of  the  array  I(£g")  independent  of  the  window  size. 
The  gradients  gx  and  gy  are  calculated  with  the  Roberts  operator. 

Now  two  thresholds  qmjn  and  wm;n  are  necessary  to  check  the  form  and  the  size  of 
the  ellipse: 

Cl:  q-,  > qmm,  (form) 

C2:  wj  > wmin<  (size) 

If  both  conditions  are  fulfilled,  the  interest  value  of  that  pixel  is  set  to  the  preliminary 
weight  w = 1/tr  Q,  otherwise  it  is  zero. 
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The  threshold  qm5n>  is  scale  independent,  a value  of  qmjn.  = 0.25  turned  out  to  be 
reasonable.  The  condition  C2  should  also  be  independent  of  scale.  Therefore,  we  used 
wroin  =f  • Swj/n,  relating  the  preliminary  weights  Wj  to  their  mean  value.  A value  f - 
1.5  was  chosen  for  all  tests  performed. 

From  the  resulting  interest  values,  w or  0,  the  relative  maximum  within  a certain 
window  nra  x nm  are  extracted.  The  window  size  nm  for  this  non-maximum  suppression 
is  independent  from  the  one  used  for  the  sums.  If  the  window  size  nm  is  larger  than  3 
the  non- maximum  suppression  is  accomplished  in  two  steps,  the  first  using  a 3x3  win- 
dow and  the  second  performing  the  comparisons  in  a spiral  manner  in  the  large  window 
to  keep  the  number  of  comparisons  independent  of  the  window  size  nm.  The  selected 
objects  are  then  stored  in  a list,  containing  the  coordinates  and  the  preliminary  weight 
w = l/tr(Q).  They  are  needed  for  the  similarity  measure. 

4 4-%  The  Selection  of  Object  Pairs 

The  initial  weight  w from  eq.  (29)  in  addition  to  tr  Q*  and  tr  Q",  requires  the  stan- 
dard deviations  a'g  and  a'g  and  the  correlation  coefficient  p - crs'g"/(<Tg'cfg")  where: 

(34  a)  a}  = Sd£=SSiSlS. 

(34  b>  A = 3s'l 

s n-  1 


* * Th 
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and 

(34  c,  = Si£^dMn^ 

65  n - 1 

The  sums  Eg\  Eg”,  )2,  and  E(g”)2  are  calculated  for  each  object.  The  mixed  sum 
Eg’g"  is  only  calculated  for  pairs  of  objects  with  a distance  \z-z"\  less  than  a given 
threshold  dmaXt,  which  reflects  the  maximum  expected  distances  between  corresponding 
objects.  All  pairs  of  objects  for  which  the  correlaion  coefficient  p is  greater  than  0.5  are 
collected  in  a list,  containing  their  coordinates  z and  z"  and  their  weights. 

4-4-3  The  Selection  of  Corresponding  Points 

The  selection  of  the  corresponding  points  is  based  on  the  assumed  geometrical  rela- 
tion between  the  images.  In  our  context  an  affine  transformation  seems  to  be  adequate, 
and  therefore,  has  been  employed.  The  robust  adjustment  is  split  into  two  steps. 
First,  only  the  shift  between  the  images  is  determined.  This  leads  to  better  approxima- 
tions, both  for  the  shifts  and  the  weights  in  the  following  6 parameter  transformation. 
Both  adjustments  have  the  same  structure. 

In  each  iteration,  the  parameters,  the  residuals,  the  precision  of  the  shift,  and  the 
average  weight  are  determined,  and  the  weights  are  adapted  for  the  next  iteration.  If  a 
weight  is  smaller  than  a certain  percentage  (say,  10%)  of  the  average  weight,  it  is  set  to 
zero,  eliminating  that  observation.  The  first  4 iterations  are  performed  with  the  weight 


function  given  by  eq.  (20b),  after  which  th  redescending  function  in  eq.  (22b)  is  applied. 
The  agorithm  stops  if  either  the  required  precision  of  the  shift  is  reached,  not  enough 
corresponding  points  are  left,  or  a pre-set  number  of  iterations  is  reached.  The  residu- 
als of  the  last  iteration  are  tested,  and  with  all  residuals  passing  this  test  one  additional 
iteration  with  equal  weights  is  performed  to  obtain  the  final  transformation  parameters. 

The  obtained  list  of  corresponding  points  may  then  still  be  ambiguous,  as  the  same 
point  in  one  image  might  correspond  to  several  points  of  a cluster  in  the  other  image. 
The  list  of  pairs  of  points  is  then  cleaned  keeping  those  correspondences  which  have  the 
smaller  residuals. 


4.5  Two  Examples 

The  following  two  examples  are  presented  to  show  the  performance  of  the  new  algo- 
rithm. In  both  cases,  the  two  images  f and  f are  derived  from  an  original  image  by 
extracting  two  separate  windows  and  distorting  them  by  an  affine  transformation 
according  to  eq.  (12b)  with  random  numbers  in  B ranging  up  to  0.15.  Thus,  the  aver- 
age linear  distortion  is  approximately  10%  or  6°.  The  extracted  windows  are  contam- 
inated with  white  Gaussian  noise  with  a standard  deviation  of  a — 15  gray  levels.  Both 
windows  are  then  smoothed  with  a 3x3  Hanning  filter  (1  2 1)T-  *(12  1). 


1.  The  first  example  is  based  on  an  artificial  image  (cf.  Figure  11).  It  may  represent 
a part  of  a rural  scene  with  some  light  roads  between  fields  of  different  brightness. 
The  dark  pixels  are  the  points  selected  by  the  interest  operator.  Table  7 contains 
the  preliminary  weight  w and  the  values  of  q in  percent  describing  the  closeness 
of  the  error  ellipse  to  a circle. 

Observe,  that  some  points,  e.g.  point  1 in  the  right  image,  lie  on  an  edge,  but 
due  to  the  irregularity  of  the  edges  have  been  selected.  Both  values,  w and  q are 
small  (w  = 311,  q = 36  % in  this  case).  From  the  15x16  = 240  possible  point 
pairs  59  were  selected  as  possible  candidates  for  correspondence.  Their  weights 
vary  considerably,  namely  due  to  the  correlation  coefficient  (cf.  Table  8).  The 
robust  shift  adjustment  yields  the  pairs  listed  in  Table  9.  It  shows  the  ambiguity 
of  the  result,  as  for  example  point  3 in  the  left  image  is  connected  with  points  7 
and  9 in  the  right  image.  As  the  residuals  of  the  pair  (3,7)  are  smaller  than  those 
of  (3,9)  the  pair  (3,7)  is  kept.  The  cleaned  list  in  Table  9b  would  be  the  result 
with  the  shift  parameters  only,  showing  that  even  with  a wrong  geometric  model 
nearly  all  corresponding  objects  can  be  found  (cf.  Table  10b). 

The  result  of  the  robust  affine  transformation  (cf.  Table  10)  shows  a slightly 
different  result.  The  final  correspondences  are  shown  in  Figure  12.  If  one  com- 
pares the  final  result  with  the  list  of  the  candidate  pairs  (Table  8),  obviously  the 
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Figure  11  Example  Is 


Artificial  Image  Pair  With  Selected  Points  (black  pixels) 


NOTE : 


Window  size  of  interest  operators: 
Window  size  for  non-maximum  suppression: 
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Table  7 Example  Is  List  of  Selected  Points 
NOTE:  x,y  coordinates 

w interest  value 

q measure  for  isotropy  of  error  ellipse  (in  percent) 


CO 

CD 


1 

201 

4.  737 

410. 

311. 

0.  813910 

2 

206 

18. 307 

410. 

469. 

0. 934143 

3 

307 

6.  047 

907 

423. 

0. 743084 

4 

309 

9.  129 

907. 

708. 

0. 769720 

5 

406 

3.  333 

364. 

469. 

0. 733839 

6 

408 

8.  666 

364. 

443. 

0. 872919 

7 

303 

27.  938 

410. 

386. 

0. 932111 

8 

310 

2.  032 

410 

280. 

0.  372882 

9 

603 

16.  436 

430. 

386. 

0 883114 

10 

608 

2.  733 

430. 

443. 

0. 623202 

11 

708 

16.  076 

714 

443. 

0.  893973 

12 

808 

11.  647 

483. 

443. 

0. 899332 

13 

810 

1.  838 

483. 

280. 

0. 389371 

14 

906 

4.  333 

343. 

469. 

0. 777062 

13 

908 

11.  360 

343 

443. 

0. 899266 

16 

911 

2.  187 

343. 

323. 

0.  399432 

17 

1002 

2.  891 

961. 

300. 

0.  611310 

18 

1003 

2.  434 

961. 

288. 

0.  611800 

19 

1004 

2.  378 

961. 

688. 

0.  319324 

20 

1007 

2.  692 

961. 

423. 

0.  362607 

21 

1102 

2.  333 

964. 

300. 

0.  613341 

22 

1103 

8.  660 

964. 

288. 

0. 874139 

23 

1104 

33.  190 

964. 

688. 

0. 966364 

24 

1107 

2.  328 

964. 

423 

0. 601318 

23 

1113 

133.  224 

964 

1374. 

0. 983662 

26 

1114 

3.  310 

964. 

368. 

0. 681337 

27 

1113 

13.  244 

964 

308. 

0.  891936 

28 

1116 

7.  161 

964. 

340. 

0.  812974 

29 

1212 

4.  827 

316. 

719. 

0.  776933 

30 

1312 

22.  732 

622. 

718 

0.  919691 

31 

1313 

7.  178 

622. 

1374. 

0.  764347 

32 

1317 

1.  891 

622. 

314. 

0 379333 

33 

1403 

2.  993 

963. 

288. 

0.  689392 

34 

1404 

11.  141 

963. 

688. 

0.  843669 

33 

1407 

3.  303 

963. 

423. 

0.  743271 

36 

1409 

2.  978 

963. 

708. 

0.  337126 

37 

1413 

13.  047 

963. 

1374. 

0.  861033 

38 

1415 

4.  759 

963. 

308. 

0.  732863 

39 

1416 

3.  142 

963. 

340. 

0.  638196 

40 

1312 

7.  280 

448. 

718. 

0.  800946 

41 

1313 

2.  030 

448. 

508. 

0 384841 

42 

1516 

2.  123 

448. 

340. 

0.  389307 

43 

1317 

12.  494 

448. 

314. 

0.  908975 

44 

1319 

24.  336 

448. 

396. 

0.  943966 

43 

1322 

11.  948 

448. 

446. 

0.  874231 

46 

1609 

9.  838 

1871. 

708. 

0.  746461 

47 

1611 

3.  331 

1871. 

323. 

0.  336333 

48 

1623 

129.  139 

1871. 

1683. 

0.  970396 

49 

1706 

16.  139 

1318. 

469. 

0.  879777 

30 

1708 

2.  468 

1318. 

443. 

0.  333313 

31 

1717 

6.  066 

1318. 

314. 

0.  764804 

32 

1719 

4.  692 

1318. 

396. 

0.  683444 

33 

1810 

1.  367 

867. 

280. 

0.  304329 

34 

1824 

3.  183 

867. 

344. 

0.  779709 

53 

1914 

3.  144 

311. 

368. 

0.  693980 

36 

1913 

7.  749 

311. 

308. 

0. 843612 

37 

1916 

6.  314 

311. 

540. 

0.  810626 

38 

2013 

3.  123 

640. 

1374. 

0.  631036 

39 

2323 

29.  446 

686. 

1683. 

0.  911693 

Table  8 Example  1: 

List  of 

Selected 

Pairs 

NOTE:  ij  pointy  No.  in  left  and  right  image  (201  S (2,1)),  w initial 

weight,w  preliminary  weights, rho  correlation  coefficient. 


estimated  shift:  11.077  -12.615 
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Table  9 Example  Is  Result  of  Robust  Shift  Adjustment 

a)  uncleaned  list  containing  ambiguities 

b)  cleaned  list 

NOTE:  6 iterations 
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Table  10  Example  2:  Result  of  Robust  Affine  Transformation 

a)  uncleaned  list,  containing  ambiguities 

b)  cleaned  list,  final  result 
(cf.  Figure  12) 

NOTE:  6 iterations 
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Figure  12  Example  1:  Result  of  Correspondence  Algorithm. 
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most  similar  objects  are  also  correspondent  ones.  The  object  pair  3(8,10)  with  the 
largest  residuals  is  found  by  chance,  ss  both  points  are  just  above  the  level  of  dis- 
tinctness. But  observe  that  the  objects  in  pair  (5,5)  are  more  similar  than  those 
in  pair  (6,5).  The  context,  i.  e.  the  common  geometrical  model,  however,  selects 
the  pair  (6,5)  due  to  its  better  fit,  which  seem  to  be  reasonable  as  can  be  seen 
from  Figure  12.  The  final  transformation  parameters  show  scale  differences  up  to 
20%  between  the  two  images. 

The  second  example  is  based  on  an  image  from  the  Arizona  Test  Area.  The  reso- 
lution of  the  original  image  has  been  reduced  by  a factor  of  two,  yielding  pixel 
sizes  of  50  /tm.  The  selected  windows  of  80x80  pixels  with  the  interesting  points 
are  shown  in  Figure  13. 

39  and  50  points  have  been  selected,  almost  all  having  error  ellipses  close  to  a 
circle  (cf.  Table  11).  From  the  1950  possible  pairs  127  were  retained  as  candi- 
dates (cf.  Table  12).  Observe  that  the  weights  in  this  case  do  not  vary  so  much 
as  in  the  first  example,  and  are  considerably  smaller.  The  final  result  yields  18 
object  pairs  and  is  shown  in  Figure  14  (cf.  Tables  13  and  14).  Also  in  this  case 
the  scale  difference  is  approximately  20%,  but  in  addition  a rotation  of  approxi- 
mately 10°  in  both  axes  becomes  apparent  from  Figure  14.  The  shifts  of  +4  and 
+ 18  pixels  correspond  to  an  overlap  of  the  two  windows  of  approximately  70%. 
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Figure  13  Example  1:  Image  Pair  from  Arizona  Test  Area  With  Selected  Points  (black  pixels) 


Table  11 
NOTE: 
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Example  2:  List  of  Selected  Points 

x,y  coordinates 
w interest  value 


q measure  for  isotropy  of  error  ellipse  (in  percent) 
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Table  12  Example  2:  List  of  Selected  Pairs 

NOTE:  ij  point  No.  in  left  and  right  image  (201  = (2,1)) 
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Table  13  Example  2:  Result  of  Robust  Shift  Adjustment 

NOTE;  7 iterations,  list  had  not  been  cleaned 
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correspondencies  were  ambiguous. 
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4.6  Preliminary  Conclusions 

From  other  experiments  with  simulated  and  real  data  the  following  preliminary  con- 
clusions may  be  drawn: 

a.  The  algorithm  in  its  present  form  works  well  if  the  relative  distortions  of  the 
images  are  not  larger  than  20-30%  (corresponding  to  a rotation  of  up  to  20°)  and 
the  overlapping  area  contains  enough  distinct  points.  These  conditions  can 
always  be  met  if  an  operator  provides  the  approximate  values,  or  the  images  are 
oriented  with  an  automatic  procedure  as  the  LNK-Method. 

b.  The  results  are  accurate  up  to  1-2  pixels,  if  the  deviation  of  the  geometrical  model 
from  the  real  distortion  is  not  too  large. 

c.  The  algorithm  is  fast  enough  to  replace  the  first  iterations  in  a correlation- based 
algorithm  for  high  precision  registration  or  rectification.  The  total  computing 
time  for  a pair  of  images  with  128x128  pixels  is  approximately  2-3  seconds  on  a 
VAX  11/780  and  is  nearly  proportional  to  the  number  of  pixels. 

d.  The  limitation  of  the  algorithm  in  its  present  form  results  from  the  similarity 
measure,  namely  the  correlation  coefficient,  which  is  not  scale  or  rotation  indepen- 
dent. As  already  pointed  out,  other  measures,  as  for  example  invariant  moments 
might  solve  this  problem. 
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Further  research  should  be  directed  towards  a link  with  the  features  of  the  LNK- 
method.  There  are  two  ways  to  do  that  which  are  complementary:  One  could 
use  robust  estimation  procedures  to  refine  the  estimation  of  the  LNK-method  and 
one  could  use  the  abstract  features,  especially  the  abstract  lines,  as  input  for  the 
correspondence  algorithm.  In  this  case  each  abstract  line  would  give  rise  to  four 
observation  equations,  derived  from  the  coordinates  of  one  end  point  of  the  line 
and  the  coordinate  differences  to  the  other  end  point.  If  consideration  is  res- 
tricted to  rotation  and  scale  differences  only,  the  angular  difference  and  the  loga- 
rithm of  the  scale  ratio  of  the  pairs  of  abstract  edges  would  lead  to  a robust  esti- 
mation of  the  means  of  the  shift,  the  rotation  and  the  logarithm  of  the  scale 
difference  of  the  two  images.  The  inclusion  of  line  features  into  the  algorithm 
would  allow  its  application  for  rectification  of  satellite  images. 


5.  CONCL  USIONS  AND  RECOMMEND  A TIONS 


5.1  Conclusion^ 

From  the  research  performed  so  far,  the  following  are  the  conclusions  to  be  drawn: 

1.  The  collinearity  (or  parametric)  model  is  superior  to  the  polynomial  (or  interpola- 
tive)  model  particularly  when  the  number  of  control  points  is  small. 

2.  Through  simulations,  it  is  shown  that  the  parametric  model  adequately  describes 
the  real  data. 

3.  Rectification  of  single  image  scanner  data  is  more  sensitive  to  image  position 
errors  than  ground  position  errors. 

4.  Uncertainty  in  attitude  estimate  is  the  main  source  of  error  in  system-corrected 
images. 

5.  In  general,  when  more  than  about  25  well  distributed  control  points  are  used,  the 
effect  in  rectification  accuracy  is  marginal. 

6.  The  distribution  of  control  features  is  critical  to  the  rectification  accuracy;  to 
obtain  the  same  accuracy  about  three  times  the  number  of  well  distributed  con- 
trol features  are  needed  when  such  features  are  randomly  distributed. 


395 


7.  The  block  adjustment  procedure  based  in  the  parametric  rectification  model  was 
successful.  Tie  points  between  overlapping  images  improved  rectification  accuracy, 
particularly  when  few  control  points  are  used. 

8.  Edges  proved  to  be  an  effective  type  of  control  for  single  image  rectification.  In 
general,  about  three  edge  pairs  are  needed  for  each  conventional  control  point. 

9.  An  efficient  new  algorithm  for  finding  corresponding  points  in  image  pairs  has 
been  developed.  The  unknown  parameters  of  the  geometric  transformation 
between  the  two  images  are  derived  using  robust  estimation  techniques. 

10.  Tests  with  simulated  and  real  data  show  that  the  present  correspondence  algo- 
rithm can  accomodate  geometric  distortions  up  to  20  to  30  %,  which  corresponds 
to  an  average  distortion  of  3 to  7 pixels  in  an  image  of  size  128x128  pixels. 

11.  The  correspondence  algorithm  incorporates  a new  operator  for  finding  distinct 
objects  in  an  image  based  on  the  expected  precision  of  locating  such  object  by 
cross  correlation. 
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5.2  Recommendations 


1.  Continue  to  investigate  other  non-convent ional  control  such  as  geometric  con- 
straints and  relative  control  (e.  g.  distances,  angles  etc.). 

2.  Extend  the  block  adjustment  program  to  accomodate  edge  control  and  perform 
tests. 

3.  Continue  to  develop  the  correspondence  algorithm  and  apply  it  to  remote  sensing 
data  both  for  registration  and  rectification. 

4.  Study  the  rectification/registration  sequence. 

5.  Investigate  rectification  accuracy  assessment. 

6.  Analyse  blunder  detection  and  identification  procedures. 

7.  Research  the  problem  of  merging  remote  sensing  data  and  digital  terrain  models. 
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ABSTRACT 

An  investigation  of  the  optimum  number  of  ground  control  points 
required  to  rectify  a full  scene  or  a portion  of  a Landsat  MSS  scene 
was  conducted  on  data  from  southeastern  Louisiana/southwestern 
Mississippi  and  eastern  Kansas.  The  ground  control  points  utilized 
were  randomly  distributed  across  the  partial  or  full  scene.  This  work 
suggest  that  24  ground  control  points  is  more  than  adequate  to  rectify 
a partial  or  full  scene  of  Landsat  MSS  data.  An  additional  study 
examined  the  error  incurred  in  choosing  ground  control  points 
representing  artificial  versus  natural  features. 


* 


Introduction 


This  study  involves  an  investigation  of  the  geometric  accuracy  of 
scene- to-map  registration  products  of  Landsat  multi spectral  scanner 
(MSS)  data.  The  rectification  of  Landsat  MSS  data  to  a Universal 
Transverse  Mercator  (UTM)  Map  base  is  an  important  pre-processing  step 
in  the  analysis  of  earth  resources  science  data.  Potential  applica- 
tions that  flow  from  an  accurate  scene- to-map  rectification  process 
includes: 

1.  component  of  a multi  source  data  base 

2.  development  of  change  detection  products 

3.  input  to  a habitat  classification 

The  accuracy  with  which  ground  control  points  (GCP)  can  be  select- 
ed is  an  important  source  of  error  in  the  construction  of  the  mapping 
equations  which  relate  Landsat  scene  coordinates  to  map  coordinates 
(northings  and  eastings  in  the  UTM  system).  The  use  of  a non-linear 
transformation  in  the  mapping  equations  may  not  be  justified,  when  one 
considers  the  accuracy  with  which  ground  control  points  can  be  select- 
ed (Steiner  and  Kirby  £9]).  A study  of  ground  control  point  selection 
accuracy  revealed  that  (Steiner  and  Kirby  £9]): 

X.  GCPs  can  be  selected  more  accurately  on  maps  than  Landsat 
images 

2.  GCPs  can  be  measured  more  accuraely  on  man-made  features  (road 
intersection)  than  on  natural  features  (land-water  inter- 
faces). 

A commonly  utilized  mapping  equation  is  the  affine  transformation 
which  is  equivalent  to  a first  degree  polynomial.  The  properties  of 


the  affine  transformation  in  relation  to  geometrical  rectification  of 
Landsat  data  are  discussed  by  Kirby  and  Steiner  [5],  Steiner  and  Kirby 
[9],  Van  Wie  and  Stein  [10],  Horn  and  Woodham  [4],  Emmert  and  McGillem 
[2],  and  Wong  [12].  In  comparing  the  UTM  map  control  points  and  the 
Landsat  scene  control  points  of  the  same  objects  utilizing  an  affine 
transformation,  the  linear  least  squares  approximation  is  used  which 
generates  residuals  which  measure  how  well  the  data  fits  the  mapping 
equation.  The  root  mean  square  (RMS)  value  is  a measure  of  the  degree 
of  fit.  The  residuals  stem  from  nonlinear  distortions  in  satellite 
orbit  and  attitude,  errors  attributable  to  curvature  of  lines  due  to 
earth  rotation  and  map  projection,  scanner  mirror  velocity  non-linear- 
ity, and  random  variation.  The  affine  transformation  accounts  for 
distortions  due  to:  translation,  scale  change,  rotation,  aspect 
ratio,  and  skew  (Van  Wie  and  Stein  [10]).  An  analysis  of  two  Landsat 
MSS  frames  of  the  component  sources  of  error  in  the  residual  error 
term  found  that  the  "other"  category  (attitude  errors)  were  generally 
larger  than  the  transformation  error  component  or  the  point  measure- 
ment error  component  (Steiner  and  Kirby  [9]).  Another  source  of 
distortion  considered  by  the  same  authors  (Kirby  and  Steiner  [5])  is 
the  differences  in  geometry  between  the  UTM  projection  and  the  Landsat 
MSS  scene.  The  affine  transformation  does  not  compensate  for  this 
distortion  (called  geometric  base  problem). 

A number  of  investigators  have  employed  polynomials  of  a higher 
degree  as  mapping  equations.  Wong  [12]  reported  an  RMS  value  of  +57m 
for  a 20  term  polynomial,  while  the  RMS  value  of  a first  degree 
polynomial  applied  to  the  same  Landsat  frame  was  +115m.  There  is  a 


tradeoff  Involved,  however,  In  that  at  least  20  GCPs  must  be  used  per 

frame  to  provide  a least  squares  solution  for  a 20  term  polynomial  (up 

to  30  GCPs  would  have  to  be  used  in  practice).  Not  only  must  the 

higher  degree  polynomial  use  a large  number  of  GCPs,  but  the  GCPs  must 

be  well  distributed  near  the  edges  and  corners  of  the  frame  (Van  Wie 

and  Stein  CIO]).  For  products  issued  by  the  EROS  master  data 

processor  (MDP)  to  produce  P- format  tapes  (spatially  and  radiometri- 

cally  corrected),  the  number  of  GCPs  used  can  be  related  to  the 

scene- to-map  registration  accuracy.  If  25  to  50  GCPs  are  used,  the 

rectification  accuracy  should  be  within  1 pixel  more  than  99%  of  the 

time.  For  8 to  24  GCPs  the  rectification  accuracy  should  be  within  10 

pixels,  while  for  1 to  7 GCPs  the  rectification  accuracy  should  be 

within  20  pixels  more  than  99%  of  the  time  (Nelson  and  Grebowsky  [7]). 

A recent  study  of  Landsat-4  P- format  rectification  accuracy 

analyzed  the  sources  of  error  due  to  locating  GCPs  accurately, 

digitizing  and  map  distortions,  and  relief  variations  (Welch  and  Usery 

[11]).  For  MSS  data  the  location  error  was  +30-40m  (rinse  or  root 

ay 

mean  square  error  vector),  the  map  and  digitizing  error  was  +10-15m, 

and  the  terrain  relief  error  was  roughly  +10-30m.  The  root  mean 

1 

square  error  vector  is  computed  from  the  deviation  between  the  mapping 
equation  and  the  withheld  ground  test  point  locations.  When  a first 
degree  polynomials  was  employed  with  10  or  more  GCPs  for  a whole 
scene,  the  rmse  value  was  j^80m.  The  use  of  a third  degree  polyno- 
mials with  30  or  more  GCPs  produced  an  rmse  value  of  +55m.  For  a 

A Y 

1024  by  1024  pixel  area,  15  GCPs  were  used  with  first  and  second 
degree  polynomials  to  produce  a minimum  rmse  value  of  +45m.  The 

Ay 
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rmse  value  increased  to  +60m  when  only  10  GCPs  were  utilized.  For 
a 256  by  256  pixel  area,  5 GCPs  and  a linear  first  order  polynomial 
yielded  an  rmse  value  of  +40m. 

The  affine  transformation  and  higher  degree  polynomials  are  an 
example  of  interpolative  or  surface  fitting  models.  The  other  type  of 
model  used  in  rectification  is  the  parametric  model  which  incorporates 
information  on  satellite  position  and  sensor  attitude  (Mikhail  and 
Paderes  [63).  The  use  of  parametric  modelling  in  rectification  has 
been  described  by  Mikhail  and  Paderes  [63,  Horn  and  Wocdham  [43,  and 
Sawada  et.  al  [83.  Parametric  modelling  includes  two  components: 
sensor  modelling  and  platform  modelling.  Sensor  modelling  corrects 
for  panoramic  effect  (pixel  projection  on  a plane),  non-linearity  of 
scanning,  and  unequal  number  of  pixels  per  scan.  Platform  modelling 
deals  with  problems  associated  with  sensor  attitude  and  the  satellite 
position  in  orbit.  Mikhail  and  Paderes  [6]  describe  this  approach  in 
some  detail.  The  satellite  collinearity  equation  was  used  to  combine 
the  sensor  and  platform  models.  Ground  control  points  are  used  to 
estimate  the  unknown  parameters  in  the  sensor  and  platform  models  (19 
unknown  parameters  existed  in  this  study).  The  conclusions  of  the 
investigation  conducted  by  Mikhail  and  Paderes  [6]  include: 

1.  The  maximum  rectification  accuracy  for  a polynomial  model  is 
about  half  a pixel. 

2.  Rectification  accuracy  is  not  significantly  improved  when  the 
number  of  GCPs  utilized  exceeds  25. 


3.  The  rectification  accuracy  is  sensitive  to  the  identification 
accuracy  of  a GCP  in  the  Landsat  image 9 but  is  insensitive  to 
the  accuracy  of  identifying  a GCP  on  the  map. 

Recent  work  by  Mikhail  and  Paderes  (personal  communication)  re- 
ported that  the  collinearity  model  gave  equal  or  lower  RMS  values  for 
the  same  number  of  ground  control  points  than  did  a polynomial  model. 
The  differences  in  RMS  values  between  the  collinearity  and  polynomial 
models  is  more  pronounced  for  10  GCPs  than  it  is  for  greater  than  40 
GCPs.  The  same  conclusions  were  arrived  at  using  synthetic  data  as 
were  determined  from  using  real  Landsat  MSS  data  from  Kansas  and 
Louisiana. 


Methods 

The  Landsat  MSS  frames  used  in  this  study  were  acquired  over 
path: 23  and  row: 39  of  the  v/orldwide  reference  system  (southeastern 
Louisiana  - coastal  Mississippi)  and  over  path: 29  and  row:33  (western 
Missouri  - eastern  Kansas).  The  Kansas  data  was  collected  on  11/9/81, 
while  the  Louisiana  data  was  collected  on  11/21/81.  Both  Landsat  MSS 
scenes  had  10%  cloud  cover.  The  Louisiana  scene  was  relatively  flat 
(elevation:  0 to  362  feet  above  sea  level)  and  contained  up  to  35% 
open  water.  The  Kansas  scene  was  hilly  (elevation:  730  to  1450  feet 
above  sea  level)  with  neglible  amounts  of  open  water.  The  extensive 
amount  of  open  water  and  wetlands  in  the  Louisiana  scene  present  a 
significant  challenge  for  accurate  rectification  when  compared  to  the 
Kansas  Landsat  frame. 


408 


r,  * W.: 

i 

H 


The  points  to  be  utilized  as  ground  control  points  (GCPs)  and 
ground  reference  points  (GRPs)  were  chosen  on  1:24,000  scale,  7.5 
minute  quadrangle  sheets  produced  by  the  U.S.  Geological  Survey 
(USGS).  The  GCPs  are  used  to  generate  the  mapping  equations  used  in 
the  geo registration  procedure,  while  the  GRPs  were  employed  as  test 
points  to  independently  assess  the  accuracy  of  the  georegistration 
procedure.  The  ground  points  map  coordinates  were  recorded  in  the  UTM 
system  as  northings  and  eastings,  while  the  Landsat  scene  coordinates 
were  recorded  as  rows  and  elements.  The  same  points  were  identified 
on  the  7.5  minute  USGS  quadrangle  sheet  and  the  Landsat  A- format  MSS 
scene.  The  types  of  features  used  as  ground  points  included  manmade 
(road  intersections)  and  natural  (river  intersections)  categories. 
Table  1 gives  some  examples  of  ground  points  utilized  in  the  Louisiana 
Landsat  frame. 

For  the  whole  scene  analysis,  356  ground  points  were  used  in  the 
Louisiana  data  set  and  359  ground  points  were  used  in  the  Kansas  data 
set.  For  the  half  scene  analysis  the  number  of  ground  points  avail- 
able was  242  for  Louisiana  and  241  for  Kansas.  For  the  quarter  scene 
analysis  the  number  of  ground  points  utilized  was  182  to  198  for 
Louisiana  (Areas  A and  B)  and  150  to  158  for  Kansas  (Areas  A and  B). 
The  ground  points  available  were  divided  into  GCPs  and  GRPs. 

The  mapping  equation  utilized  was  a linear  polynomial  and  the  fit 
of  the  GCPs  to  the  mapping  equation  was  quantified  by  the  computation 
of  the  RMS  value  (in  meters).  To  evaluate  the  georegistration 
accuracy  of  the  Landsat  MSS  product,  the  procedure  of  Graham  and 
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Luebbe  [3]  was  employed.  This  procedure  quantifies  the  georegistra- 
tion accuracy  In  terms  of  RBIAS  {row  offset),  CBIAS  {column  offset), 
RSD  (row  standard  deviation),  and  CSD  (column  standard  deviation). 
Good  georegistration  accuracy  would  be  characterized  by  sub-pixel 
offsets  and  standard  deviation  values.  The  equations  for  computing 
bias  and  standard  deviation  are: 


(1)  RBIAS  = 

(2)  RSD  = 


R (R0W11  - R0W2-  ) 
1 = 1 

NP 


1 


NP 

fz 

1=1 


(ROWli  - R0W2 - - RBIAS)' 


NP  - I 

where  NP  is  the  number  of  GRPs  chosen,  R0W1  is  the  Landsat  row  pre- 
dicted from  the  mapping  equation,  and  R0W2  is  the  Landsat  row  read 

from  the  MSS  imagery.  The  units  of  RBIAS  and  RSD  are  in  pixels. 

The  ERL  computer  software  module  GNRI  was  utilized  to  take  a 
random  sample  of  GCPs  from  the  overall  ground  point  list  for  both  the 
Kansas  and  Louisiana  data  sets.  The  random  samples  were  chosen  in 
groups  of  eight  and  combined  with  the  previously  chosen  GCPs.  Groups 
of  eight  were  utilized  because  the  quality  assessment  number  associat- 
ed with  Landsat  P-format  MSS  tape's  registration  accuracy  employs 
multiples  of  eight.  The  module  CSPA  was  utilized  to  compute  "R" 
values  which  give  a measure  of  the  spatial  distribution  of  ground 
control  points  (Dow  Cl3).  For  the  purposes  of  this  paper  "R"  values 

between  0.7  and  1.3  are  indicative  of  a random  spatial  distribution. 

The  module  BMGC  was  utilized  to  compute  the  bias  and  standard 
deviation  values  as  well  as  the  RMS  numbers. 
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A separate  test  was  conducted,  In  conjunction  with  the  overall 
study,  to  see  how  accurately  a given  ground  control  point  could  be 
selected.  Ten  ground  control  points  representing  natural  features  and 
ten  ground  control  points  representing  manmade  features  were  picked  at 
random  from  the  Louisiana  P-format  MSS  data  set.  This  experiment  was 
the  only  case  in  this  paper  in  which  P-format  data  was  used.  These 
ground  control  points  were  reselected  ten  times  in  order  to  see  how 
much  operator  error  was  introduced  in  GCP  selection.  The  operator 
located  the  ground  control  point  on  a 1:24,000  scale  USGS  map  and  then 
moved  the  track  ball  cursor  on  the  digital  image  display  device  until 
the  same  ground  point  was  identified  on  the  display  screen.  This 
procedure  was  replicatd  ten  times  with  the  data  processor,  who  record- 
ed the  Landsat  scene  coordinates  location,  not  informing  the  track 
ball  operator  of  the  results.  Table  1 lists  the  characteristics  of 
the  ground  control  points  used  in  this  study. 

Most  of  the  statistical  analysis  utilized  in  this  report  was 
generated  using  the  BMDP  Statistical  Package  (Dixon  et  al.  [0]).  The 
descriptive  statistics  (mean,  standard  deviation,  standard  error  of 
the  mean)  and  analysis  of  variance  were  run  using  program  BMDP7D.  The 
analysis  of  variance  model  was  tested  for  equality  of  variances  using 
Levene's  test  and  if  the  Levene's  test  results  were  statistically 
significant,  then  the  Brown- Forsythe  procedure  was  used  for  the 
analysis  of  variance  computations  (Dixon  et  al.  C03).  The  Duncan's 
multiple  range  test  was  employed  to  separate  out  significant  treatment 
effects  in  those  cases  where  the  analysis  of  variance  results  were 


statistically  significant  at  the  5 % level.  The  correlation  analysis 
was  carried  out  with  program  BMDP6D. 

Results  and  Discussion 

The  ground  control  point  accuracy  experiment  found  a row  bias  of 
0.04  pixels  for  natural  features  and  a column  bias  of  0,12  pixels. 
For  manmade  features  the  row  bias  was  0.12  pixels  and  the  column  bias 
was  0.04  pixels.  It  appears  from  these  results  that  manmade  and 
natural  features  can  be  chosen  with  equal  accuracy.  Also  the  operator 
bias  in  ground  control  point  selection  does  not  represent  a serious 
source  of  error  in  the  scene- to-map  registration  procedure. 

The  results  of  the  optimum  number  of  GCPs  needed  to  rectify  a 
given  portion  of  a Landsat  MSS  scene  in  Louisiana  and  Kansas  is  given 
in  Tables  2 through  6.  The  "N"  column  gives  the  number  of  ground 
control  points  used  to  develop  the  mapping  equation.  The  "R"  column 
gives  an  indication  of  the  type  of  spatial  distribution  that  the  GCPs 
exhibit  across  the  Landsat  scene.  The  “RMS"  is  a measure  of  how  well 
the  GCPs  fit  the  mapping  equation.  The  accuracy  of  the  georegistra- 
tion procedure  is  measured  by  the  RBIAS,  RSD,  CBIAS  and  CSD  values 
(measured  as  fractions  of  a pixel).  The  bias  and  standard  deviation 
values  are  computed  from  the  GRPs.  The  values  in  the  last  row  of  each 
column  represent  the  mean  and  95%  confidence  interval  about  the  mean. 
This  row  is  presented  for  a general  descriptive  overview  of  the 
results,  but  should  not  be  interpreted  literally  in  those  cases  where 
the  analysis  of  variance  (ANOVA)  results  are  statistically  significant 
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(Indicated  by  *).  The  results  presented  represent  the  outcome  of  40 
replicates  for  each  the  "N"  equals  8 through  40. 

For  the  whole  scene  analysis  the  R values  for  Kansas  are  larger 
than  those  for  Louisiana  (see  Table  2).  The  reason  for  this  is  that 
the  Louisiana  scene  has  large  areas  of  open  water  in  which  it  is 
impossible  to  choose  GCPs.  The  RMS  column  shows  what  appears  to  be  a 
counter-intuitive  result  in  that  the  RMS  value  goes  up  as  the  number 
of  GCPs  utilized  increases  from  8 to  40.  The  reason  for  this  appears 
to  be  that  as  the  number  of  GCPs  increases,  it  is  more  likely  to 
encounter  outlier  GCPs  which  distort  the  overall  RMS  value.  Another 
possible  contibuting  factor  is  that  for  N equals  8,  there  is  only  one 
degree  of  freedom  left  over  to  make  the  estimate  of  the  mean  and  thus 
the  mapping  equation  lacks  the  redundancy  in  GCPs  necessary  to  make  a 
precise  estimate  of  the  mean.  The  RBIAS  and  CBIAS  values  decrease  in 
magnitude  as  the  number  of  GCPs  used  (N)  increases.  In  this  case 
outliers  do  not  distort  the  results  because  there  are  many  more  GRPs 
used  to  check  rectification  accuracy  than  the  GCPs  employed  to  gen- 
erate the  mapping  equation  (GRPs  = total  available  ground  points  - 
GCPs).  The  RSD  and  CSD  values  are  fairly  constant  in  magnitude  with 
increasing  N values.  This  being  the  case  it  was  decided  to  concen- 
trate on  the  RBIAS  and  CBIAS  values  in  order  to  decide  what  the 
optimum  number  of  GCPs  required  to  register  a whole  scene  of  Landsast 
data  was. 

The  results  of  the  Duncan's  Multiple  range  test  were  utilized  to 
choose  the  optimum  number  of  GCPs  required.  The  N values  for  which 
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the  Duncan's  multiple  range  test  gave  significant  differences  were 
separated  out  from  those  treatment  effects  which  were  non-significant. 
The  range  of  N values  which  were  not  significant  were  viewed  as 
delineating  the  number  of  GCPs  which  gave  equivalent  results  and  the 
optimum  number  of  GCPs  was  the  lowest  H which  gave  non-significant 
results.  For  a whole  scene  the  optimum  number  of  GCPs  for  the 
Louisiana  data  was  N=24(RBIAS)  and  N=16(CBIAS),  while  for  Kansas  the 
results  were  N*24(RBIAS)  and  N=32(CBIAS).  The  RBIAS,  RSD,  CBIAS,  and 
CSD  values  were  roughly  the  same  for  both  the  Louisiana  and  Kansas 
data  sets.  In  both  data  sets  the  RBIAS  and  RSD  numbers  were  less  than 
the  CBIAS  and  CSD  values.  Thus,,  registration  accuracy  is  more  accu- 
rate in  the  row  direction  than  in  the  column  direction.  Good 
rectification  accuracy  is  indicated  by  the  sub-pixel  bias  and  standard 
deviation  values  for  both  Louisiana  and  Kansas. 

The  results  of  the  half  scene  analysis  are  presented  in  Table  3. 
The  RMS  values  show  the  usual  trend  of  increasing  as  the  number  of 
GCPs  utilized  increases.  The  RBIAS  and  CBIAS  values  decrease  as  the  N 
value  increases.  For  the  Louisiana  Landsat  frame  the  RMS,  RBIAS  and 
CBIAS  values  are  less  in  the  half  scene  analysis  than  is  the  case  for 
the  whole  scene  analysis,  but  the  Kansas  data  is  the  same  for  these 
parameters  in  the  half  and  whole  scene  analysis.  The  optimum  number 
of  GCPs  for  the  Louisiana  data  in  the  half  scene  analysis  is  M=16 
(CBIAS  and  RBIAS),  while  the  results  for  Kansas  are  N=16(RBIAS)  and 
N=24(CBIAS).  Once  again  the  sub-pixel  bias  and  standard  deviation 
values  indicate  that  good  scene- to-map  registration  accuracy  has  been 
obtained. 
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Tables  4 through  6 present  the  results  of  the  quarter  scene 
analysis  for  two  different  areas  (A  and  8)  of  the  Landsat  MSS  Frame. 
Both  areas  A and  area  B in  Louisiana  were  chosen  in  such  a manner  that 
a minimal  amount  of  open  water  was  available.  For  both  area  A and 
area  B in  Louisiana  and  Kansas  the  GCPs  were  well  distributed  across 
the  area  covered  (given  by  L,  scan  lines,  and  E,  elements)  which 
resulted  in  R values  in  excess  of  0.94.  In  Table  4 which  presents  the 
Area  B results  for  Louisiana  and  Kansas,  the  R value  is  higher  and  the 
RMS  value  is  lower  than  is  the  case  for  the  half  or  whole  scene  analy- 
sis. The  RMS  values  increase  in  magnitude  as  the  N value  increases, 
while  the  bias  (row  and  column)  figures  decrease  in  value  as  N 
increases.  The  explanation  for  this  phenomena  is  the  same  one  that 
was  provided  earlier.  The  optimum  number  of  GCPs  required  to  rectify 
a quarter  of  a Landsat  scene  are  for  Louisiana  Ns24(RBIAS)  and 
N=16(CBIAS),  while  for  Kansas  the  numbers  are  N=16(R8IAS)  and 
N“8(CBIAS).  Once  again  good  scene-to-map  registration  accuracy  is 
indicated  by  sub-pixel  bins  and  standard  deviation  values. 

Tables  5 and  6 compare  areas  A and  B for  Louisiana  and  Kansas. 
For  Louisiana  the  RMS,  bias,  and  standard  deviation  values  are  higher 
in  magnitude  for  area  A than  they  are  for  area  B (see  Table  5).  These 
numbers  are  the  same  for  areas  A and  B in  Kansas  (see  Table  6).  The 
trends  in  RMS,  bias  and  standard  deviation  values  are  the  same  for 
area  A for  both  Kansas  and  Louisiana  as  those  previously  explained  for 
area  B for  both  Landsat  frames.  Thus  analysis  of  two  different  por- 
tions of  a Landsat  frame  yielded  similar  results  which  suggests  that 
the  previous  conclusions  may  not  be  data  specific. 


Tables  7 and  8 present  the  results  of  correlation  and  regression 
analysis  for  Louisiana  and  Kansas  for  data  from  the  whole  scene  ana- 
lysis. The  columns  represent  the  dependent  and  independent  variables 
{y  and  x)  in  the  regression  equation,  r is  the  correlation  coefficient 
(the  square  of  r represents  the  amount  of  the  total  variation  explain- 
ed by  the  regression  analysis),  m is  the  slope  of  the  regression 
equation,  and  b is  the  intercept  of  the  regression  equation.  The  last 
column  presents  the  statistical  significance  of  r (non-sigificant= 
N.S.;  or  significant  at  the  1 % or  5%  levels).  In  this  analysis  the  r 
value  may  be  significant  at  the  1%  or  5$  levels  because  of  the  large 

number  of  replicates  employed,  but  the  regression  equation  may  not  be 

2 

meaningful  because  of  low  values  of  r and  r . It  was  decided  that  the 
later  situation  prevailed  in  this  data.  It  was  concluded  that  there 
is  no  apparent  relationship  between  RMS  values  and  the  bias  and 
standard  deviation  figures.  Furthermore  there  is  no  apparent  rela- 
tionship between  CBIAS  and  RBIAS  or  CSD  and  RSD.  This  suggests  that 
all  of  these  variables  (RMS,  RBIAS,  RSD,  CBIAS,  and  CSD)  are  inde- 
pendent of  one  another  and  that  the  variables  measure  different 
properties  of  the  scene- to-map  registration  process.  One  would  expect 
this  result  from  the  background  information  discussed  in  the  intro- 
duction. 

Conclusions 

The  ground  control  point  accuracy  experiment  quantified  the  error 
associated  with  choosing  GCPs.  This  error  did  not  seem  to  differ 
between  manmade  and  natural  features.  The  RMS  values  increased  in 
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magnitude  as  the  N number  increased,  while  the  bias  and  standard 
deviation  values  decreased  as  N increased.  This  result  coupled  with 
the  correlation/regression  analysis  suggested  that  the  RMS  number 
measures  a different  property  of  the  scene-to-map,  registration 
process  (how  well  GCPs  fit  the  mapping  equation)  than  does  the  bias 
and  standard  deviation  figures.  The  bias  and  standard  deviation 
values  should  be  utilized  to  estimate  the  accuracy  of  the  scene 
rectification  process.  It  appears  from  this  study  that  24  GCPs  should 
be  more  than  adequate  to  rectify  a Landsat  scene-to-map,  for  portions 
of  a Landsat  frame  (quarter  of  a scene  up  to  a whole  scene)  using  a 
relatively  simple  linear  polynomial  as  a mapping  equation.  It  is 
possible  that  more  complex  mapping  equations  may  yield  better  results, 
a consideration  if  one  will  be  performing  scene-to-map  registration  of 
landsat  thematic  mapper  data  (30  meter  pixels)  in  the  future. 
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Table  1.  Ground  Control  Points  Used  in  Accuracy  Experiment 


POINT  # 

MAP 

NATURAL/MANMADE 

DESCRIPTION 

191 

Savannah  SW 

Natural 

Site  is  the  corner  of  a forest  block 

230 

Bay  St.  Louis 

Natural 

Site  is  two  tributaries  joining 
together 

047 

Folsom  NE 

Natural 

Site  is  a field  corner  against  a 
forest 

366 

Happy  Jack 

Natural 

Site  is  two  bayous  coming  together 

372 

Point  a la  Hatche(62) 

Natural 

Site  is  two  marsh  bayous  coming 
together 

033 

Ponchatoula  SE 

Natural 

Site  is  two  marsh  bayous  coming 
together 

137 

Haaswood 

Natural 

Site  is  two  wetland  rivers  coming 
together 

371 

Lake  Batola 

Natural 

Site  is  two  marsh  rivers  coming 
together 

209 

Malheureux  Point 

Natural 

Junction  of  two  marsh  bayous 

328 

Oak  Mound  Bayou 

Natural 

Junction  of  two  marsh  bayous 

045 

Folsom  NE 

Manmade 

Powerline/dirt  road  junction 

001 

Frankl inton  SW 

Manmade 

Road  intersection 

171 

Logtown 

Manmade 

Interstate/4-lane  highway 
intersection 

350 

Vancleave  (62) 

Manmade 

Highway  intersections 

317 

McHenry  (62) 

Manmade 

Highway/dirt  road  intersection 

275 

Bush 

Manmade 

Dirt  road  intersection 

279 

St.  Tammany 

Manmade 

Pi pel ine/highway  intersection 

354 

Pascagoula  (62) 

Manmade 

Dirt  road/highway  intersection 

246 

Carnes  NW 

Manmade 

Interstate/dirt  road  intersection 

264 

Bougalousa  NE 

Manmade 

Highway/dirt  road  intersection 

Table  2. 


LOUISIANA  - WHOLE  SCENE 


N 

R 

RMS 

(40 

replicates) 

RBIAS 

RSO 

CBIAS 

CSD 

8 

0.77 

94.58 

0.38 

0.06 

0.82 

0.14 

16 

0.77 

119.18 

0.20 

0.06 

0.39 

0.12 

24 

0.73 

129.02 

0.17 

0.06 

0.42 

0.12 

32 

0.71 

132.72 

0.16 

0.06 

0.36 

0.12 

40 

0.71 

133.95 

0.14 

0.06 

0.37 

0.12 

All 

* 

* 

* 

* 

* 

* 

0.74  + 0.02 

121.89 

+ 3.94 

0.21  + 0.03 

0.060 

0.47  + 0.06 

0.12 

ft 

SIGNIFICANT  AT  5%  LEVEL 

IN  AN0VA 

-£» 

ro 

o 


+ 0.002 


KS  - WHOLE  SCENE 


(40  replicates) 

N 

R 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

0.86 

112.60 

0.27 

0.07 

0.70 

0.16 

16 

0.85 

140.80 

0.21 

0.06 

0.44 

0.14 

24 

0.83 

144.88 

0.17 

0.06 

0.39 

0.14 

32 

0.82 

148.72 

0.16 

0.06 

0.30 

0.14 

40 

0.83 

146.30 

0.15 

0.06 

0.27 

0.14 

All 

0.84  + 0.02 

* 

* 

* 

* 

* 

138.66 

+ 5.59  0.19  + 0.02 

0.06  + 0.002 

0.42  + 0.06 

0.14 

*. 

• 

SIGNIFICANT  AT  5%  LEVEL 

IN  AN0VA 

+ 0.002 
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Table 

3. 

LA.  - 

HALF  SCENE  (40  REPLICATES) 

N 

JR 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

0.75 

84.18 

0.19 

0.06 

0.54 

0.14 

16 

0.70 

100.55 

0.14 

0.05 

0.34 

0.13 

24 

0.67 

108.65 

0.12 

0.05 

0.31 

0.12 

32 

0.66 

110.45 

0.10 

0.05 

0.28 

0.12 

40 

0.66 

111.50 

0.10 

0.05 

0.27 

0.13 

All 

0.69  + 0.02 

* 

* 

* 

■k 

■k 

103.07  + 2.64 

0.13  + 0.02 

0.05  + 0.001 

0.35  + 0.04 

0.13  + 0.002 

*:  SIGNIFICANT  AT  5%  LEVEL  IN  ANOVA 


KS.  - 

HALF  SCENE  (40  REPLICATES) 

N 

R 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

0.90 

111.55 

0.43 

0.08 

0.70 

0.16 

16 

0.81 

133.88 

0.26 

0.08 

0.58 

0.16 

24 

0.79 

140.38 

0.22 

0.08 

0.44 

0.16 

32 

0.80 

143.75 

0.19 

0.08 

0.38 

0.16 

40 

0.79 

146.30 

0.16 

0.08 

0.33 

0.16 

All 

* 

* 

* 

0.08  + 0.002 

* 

0.16  + 0.002 

0.82  + 0.02 

135.17  + 6.59 

0.25  + 0.05 

0.48  + 0.05 

*:  SIGNIFICANT  AT  5%  LEVEL  IN  ANOVA 


Table  4 


LOUISIANA  1/4  SCENE 


AREA  B L;200  - 1691 
ET7BtPr~2OT 


(replicates  = 40) 


N 

R 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

1.06 

60.68 

0.26 

0.06 

0.36 

0.08 

16 

1.07 

71.70 

0.17 

0.06 

0.22 

0.07 

24 

1.05 

75.35 

0.10 

0.06 

0.21 

0.07 

32 

1.00 

76.00 

0.12 

0.06 

0.21 

0.07 

40 

1.00 

76.58 

0.11 

0.06 

0.17 

0.07 

All 

1.03  + 0.02 

* 

* 

* 

* 

* 

+ 95%CI 

72.06  + 2.01 

0.15  + 0.02 

0.062  + 0.001 

0.24  + 0.02 

0.074  + 0.002 

*:  ANOVA  SIGNIFICANT  AT  5%  LEVEL 

KANSAS  1/4  SCENE 

- AREA  B L:500  - 

1991 

™ E:700  - 

“247? 

(replicates  = 40) 

N 

R 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

1.20 

53.78 

0.26 

0.08 

0.27 

0.13 

16 

1.15 

78.48 

0.15 

0.07 

0.25 

0.13 

24 

1.14 

85.38 

0.13 

0.07 

0.27 

0.13 

32 

1.14 

87.23 

0.12 

0.07 

0.24 

0.13 

40 

1.12 

88.08 

0.12 

0.07 

0.21 

0.14 

All 

1 .15  +_ 

0.02 

* 

* 

0.07  + 0.002 

0.25  + 0.03 

0.133  +0.002 

+ 

953SCI 

78.58  + 5. 

,21  0.16+0.02 

*. 

* 

ANOVA  SIGNIFICANT  AT 

5%  LEVEL 

| 
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'T 


p~! 
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Louisiana  1/4  Scene:  Area  A:  L:l-1500 

E.-1-1774 


N 

R 

RMS 

RBIAS 

RSD 

CBIAS 

CSD 

8 

1.05 

123.60 

0.48 

0.12 

0.84 

0.23 

16 

1.01 

131.58 

0.26 

o.n 

0.53 

0.21 

24 

1.03 

136.25 

0.21 

0.12 

0.37 

0.22 

32 

1.04 

140.25 

0.20 

0.12 

0.35 

0.22 

40 

1.06 

144.85 

0.19 

0.12 

0.35 

0.23 

All 

1.04  + 0.02 

135.30  + 7.38 

* 

0.117  + 0.002 

* 

* 

*: 

No. 

ANOVA  Significant  at  5%  level 
Replicates:  40 

0.27  + 0.04 

0.49  + 0.07 

0.222 

Louisiana  1/4  Scene  - Area  B:  L:200-1691 

E: 700-247? 


N 

R 

RMS 

BIAS 

S.D. 

BIAS 

S.D. 

8 

1.06 

60.68 

0.26 

0.06 

0.36 

0.08 

16 

1.07 

71.70 

0.17 

0.06 

0.22 

0.07 

24 

1.05 

75.35 

0.10 

0.06 

0.21 

0.07 

32 

1.00 

76.00 

0.12 

0.06 

0.21 

0.07 

40 

1.00 

76.58 

0.11 

0.06 

0.17 

0.07 

All 

1.03  + 0.02 

* 

* 

* 

* 

* 

72.06  + 2.01 

0.15  + 0.02 

0.062  + 0.001 

0.24  + 0.02 

0.074 

*:  ANOVA  Significant  at  B%  Level 
No.  Replicates  = 40 
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Table  6. 

Kansas  1/4  Scene  - 

Area  A: 

L:  1- 1500 

(V 

E:  1-1774 

ro 

I 

R 

RMS 

RBIAS 

RS  0 

CBIAS 

CSD 

8 

1.00 

79.30 

0.21 

0.06 

0.64 

0.14 

16 

0.96 

93.58 

0.15 

0.06 

0.35 

0.12 

24 

0.06 

96.90 

0.13 

0.06 

0.29 

0.12 

32 

0.95 

97.60 

0.10 

0.06 

0.25 

0.13 

40 

0.94 

98.68 

0.10 

0.06 

0.22 

0.13 

All 

0.97  + 0.02 

* 

* 

0.059 

+ 0.001  * 

* 

93.21  + 2.12 

0.14  + 0.02 

0.35  + 0.04 

0.130  + 0.002 

•kn 

m 

ANOVA  significant  at  5$  level 

No. 

Replicates:  40 

Kansas  1/4  Scene  - Area  "B1 


L: 500- 1991 
E: 700-2474 


ft 

V . 

N 

R 

RMS 

BIAS 

S.O. 

BIAS 

S.D. 

V 

8 

1.20 

53.78 

0.26 

0.08 

0.27 

0.13 

16 

1.15 

78.48 

0.15 

0.07 

0.25 

0.13 

: *: 

24 

1.14 

85.38 

0.13 

0.07 

0.27 

0.13 

32 

1.14 

87.22 

0.12 

0.07 

0.24 

0.13 

1 

40 

1.12 

88.08 

0.12 

0.07 

0.21 

0.14 

All 

1.15  + 0.02 

* 

* 

0.071  + 0.002 

0.25  + 0.03 

■k 

I 

1 ! 

f>'  ; 

-k. 

* 

78.58  + 5.21 
ANOVA  Significant  at  5 % Level 

0.16  + 0.02 

0.133 
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Table  7.  LOUISIANA  CORRELATION  ANALYSIS  - WHOLE 


Scene  (N  = 8 

- 40,  200  replicates) 

£ 

X 

r 

m 

b 

Significance 
of  V 

RBIAS 

RMS 

-0.193 

0.0014 

0.379 

1% 

RSD 

RMS 

-0.045 

-0.906x10”° 

0.061 

N.S. 

CBIAS 

RMS 

-0.174 

0.0027  „ 

0.801 

5% 

CSD 

RMS 

-0.529 

-0.359x10”° 

0.164 

1% 

CBIAS 

RBIAS 

-0.257 

0.548 

0.359 

n 

CSD 

RSD 

-0.082 

0.276 

0.104 

N.S. 

KANSAS  CORRELATION  ANALYSIS  - 

WHOLE  SCENE 

£ 

X 

{N  = 8 
£ 

- 40,  200  replicates) 
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1% 
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Table  8.  LOUISIANA  CORRELATION  ANALYSIS  - WHOLE  SCENE 


(N  = 40; 

40  replicates) 

Significance 

o rv — 

X 

X 

£ 

m 

b 

RBIAS 

RMS 

0.051 

0.466xl0"| 

0.076 

N.S. 

RSD 

RMS 

-0.448 

-0.706x10”* 

0.069 

1% 

CBIAS 

RMS 

-0.004 

-0.757x10”! 

0.075 

N.S. 

CSD 

RMS 

-0.322 

-0.165x10”° 

0.137 

5% 

CBIAS 

RBIAS 

-0.013 

-0.028 

0.374 

N.S. 

CSD 

RSD 

0.162 

0.526 

0.084 

N.S. 

KANSAS  CORRELATION  ANALYSIS  - WHOLE 

SCENE 

(N  = 40; 

40  replicates) 

Significance 

X 

_x 

£ 

m 

b 

op? 

RBIAS 

RMS 

0.349 

0.0012  n 

-0.031 

5% 

RSD 

RMS 

-0.-023 

-0.124x10’° 
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N.S. 

CBIAS 

RMS 

0.144 

0.0010  * 

0.124 

N.S. 

CSD 

RMS 

-0.311 

-0.708x10'* 

0.148 

5% 

CBIAS 

RBIAS 

0.054 
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0.258 

N.S. 

CSD 

RSD 

-0.173 
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N.S. 
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Empirical  studies  of  digital  images  derived  by  scanning 
air  photos  and  through  acquiring  aircraft  and  spacecraft 
scanner  data  shows  that  spatial  structure  in  scenes  can  be 
measured  and  logically  related  to  texture  and  image  vari- 
ance. Local  variance,  measured  as  the  average  standard 
deviation  of  brightness  values  within  a three-by-three  mov- 
ing window,  reaches  a peak  at  a resolution  cell  size  about 
two-thirds  to  three-fourths  the  size  of  the  objects  within 
the  scene.  If  objects  are  smaller  than  the  resolution  cell 
size  of  the  image,  this  peak  does  not  occur  and  local  vari- 
ance simply  decreases  with  increasing  resolution  as  spatial 
averaging  occurs.  Variograms,  which  measure  the  average 
squared  difference  in  pairs  of  brightness  values  as  a func- 
tion of  the  distance  and  direction  between  them,  can  also 
reveal  the  size,  shape,  and  density  of  objects  in  the  scene. 
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INTRODUCTION 

This  paper  presents  the  continuation  of  research 
described  in  the  1982  issue  of  these  Proceedings  [1],  To 
avoid  redundancy,  the  reader  is  referred  to  that  paper  for  a 
more  thorough  background.  The  primary  research  goal  is  to 
develop  a better  understanding  of  spatial  patterns  in  image 
data  as  they  relate  to  the  characteristics  of  the  ground 
scene.  The  long  term  objective  is  to  develop  methods  of 
scene  inference  that  directly  exploit  the  relationship 

between  the  ground  scene  and  the  spatial  patterns  in  images. 

' ■ * . ■ 

Before  that  goal  can  be  accomplished,  an  improved  under- 
standing  of  the  meaning  of  the  results  of  various  measures 
of  spatial  pattern  must  be  developed,  in  particular,  iden- 
tifying those  characteristics  of  the  ground  scene  that  can 
be  recovered  from  measures  of  spatial  pattern  is  of 
interest. 

In  this  paper,  images  from  a variety  of  environments 
and  spatial  resolutions  are  examined  using  two  methods  of 
measuring  spatial  pattern.  In  addition,  a new  direction  of 
research  that  involves  simulating  remotely  sensed  images 
will  be  discussed.  The  use  of  simulated  images  allows  for 
control  of  the  ground  scene,  which  aids  the  interpretation 
of  spatial  pattern  measurements. 

The  two  methods  used  to  measure  spatial  patterns  are 
(1)  graphs  of  local  variance  as  a function  of  spatial  reso- 


429 


I 

T 

ii: 


II- 


rr 

Uj 


ih 


V_: 

i ; 

ii: 


u 


ii 


1-  - 


L-l 


Lj 


fT71 

ii 


r* 


t( 


<n 


lution,  and  (2)  two-dimensional  variograms.  Local  variance 
is  measured  in  images  as  the  mean  value  of  a texture  image. 
The  texture  value  at  a given  pixel  is  the  standard  deviation 
of  the  surrounding  3X3  window  of  pixels.  Note  that  this 
definition  is  only  one  of  many  possible  definitions  of  tex- 
ture [2].  To  evaluate  local  variance  over  a range  of  spa- 
tial resolutions,  the  imagery  was  degraded  to  successively 
coarser  resolutions  by  simply  averaging  the  resolution  cells 
to  be  combined  into  a new,  larger  resolution  cell. 

An  alternative  method  of  examining  spatial  structure  in 
images  is  through  the  variogram  [3] . The  variogram  is  cal- 
culated as  the  mean  squared  deviation  between  two  pixels  a 
given  distance  and  direction  apart.  This  can  be  thought  of 
as  a measure  of  the  expected  difference  between  two  pixels 
given  the  spatial  relationship  between  them.  The  results  of 
these  calculations  are  plotted  as  two-dimensional  contour 
plots.  A more  detailed  description  of  both  methods  can  be 
found  in  last  year's  paper. 

Another  brief  note  of  background  concerns  the  descrip- 
tion of  ground  scenes.  The  concept  of  a scene  model,  or  the 
development  of  a generalization  of  the  nature  of  a scene,  is 
essential  to  this  project.  A scene  can  be  described  as 
being  composed  of  objects  on  a plane,  or  as  in  our  case,  as 
elements  on  a background.  Scene  models  can  have  numerous 
types  of  elements  and  be  very  complex,  and  even  have  a 
nested  structure  in  which  smaller  elements  are  used  to 
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describe  or  define  other  elements.  Reference  to  the  ele- 
ments in  the  scene  and  their  characteristics  occurs 
throughout  the  discussion  of  the  results. 

BESHMffi 

Both  methods  of  measuring  spatial  pattern  in  image  data 
were  used  to  evaluate  spatial  patterns  acguired  from  three 
types  of  environments  in  images  at  two  resolutions.  Imagery 
was  analyzed  at  two  different  resolutions  in  order  to  be 
able  to  cover  a wider  range  of  resolutions  in  the  local 
variance/resolution  graphs  and  to  help  illustrate  that  the 
formulation  of  scene  models  is  related  to  the  resolution  of 
the  data.  The  imagery  at  very  fine  resolutions  for  each 
environment  was  digitally  scanned  from  aerial  transparencies 
using  a microdensitometer , thus  allowing  analysis  of  spatial 
pattern  at  finer  resolutions  than  are  available  from  conven- 
tional spaceborne  sensors.  Thematic  Mapper  (TM)  or  Thematic 
Mapper  Simulator  (TMS)  data  were  used  as  the  coarse  resolu- 
tion data  for  each  environment.  The  three  types  of  environ- 
ments used  in  the  analysis  are  forested,  large-field  agri- 
cultural, and  urban/suburban , The  presentation  of  the 
results  and  their  discussion  is  organized  around  the  indivi- 
dual images  analyzed,  beginning  with  the  finer  resolution 
imagery  for  each  environment. 


I 
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South  Dakota  Forest  Image 

The  color  aerial  transparency  of  the  area  used  to 
create  this  image  was  obtained  from  the  Nationwide  Forestry 
Applications  Program  office  at  the  Johnson  Space  Center  in 
Houston,  Texas.  The  exact  location  of  the  area  in  South 
Dakota  covered  by  the  photograph  is  unknown,  but  it  serves 
as  a good  example  of  a simple  forested  environment  composed 
of  trees  on  a relatively  uniform  background. 

Figure  1 A-D  shows  the  digitized  photograph  at  the  ori- 
ginal resolution  and  as  averaged  to  calculate  the  graph  of 
local  variance  as  a function  of  resolution  (Figure  2).  The 
graph  shows  that  local  variance  is  low  at  the  resolution 
that  the  photo  was  scanned,  or  0.75  m (Figure  1A) . At  this 
resolution,  if  a pixel  falls  on  a tree,  its  immediate  neigh- 
bors are  also  likely  to  be  on  the  tree,  since  many  pixels 
comprise  individual  trees.  In  this  situation,  the  pixels  in 
a 3 X 3 window  are  likely  to  have  similar  values  and  the 
local  variance  will  be  low.  Similarly,  if  a pixel  lies  on 
the  background,  its  neighbors  are  also  likely  to  be  on  the 
background,  and  local  variance  will  again  be  low.  Natur- 
ally, some  windows  will  fall  along  the  borders  of  the  trees 
or  background,  and  as  a result  will  have  high  local  vari- 
ance, but  the  mean  local  variance  for  the  image  will  still 
be  low. 
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Figure  1.  South  Dakota  forest  image  shown  at  resolutions  as 
scanned  (0.75  m)  (A)r  and  as  averaged  to  yield  resolutions 
of  3 m (B)  , 6 m (C),  and  9.0  m (D) . 
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Figure  1,  continued.  South  Dakota  forest  image,  scanned  and 
averaged  to  resolutions  of  12.0  m (E)  and  16  m (F) . 


SPATIAL  RESOLUTION 


Figure  2.  Local  variance  (average  standard  deviation  within 
a three-by-three  window)  as  a function  of  resolution  cell 
size  for  the  South  Dakota  forest  image. 


As  the  size  of  individual  resolution  cells  increases 


the  number  of  pixels  comprising  an  individual  tree 
decreases,  and  the  likelihood  that  surrounding  pixels  will 
be  similar  decreases  (Figure  IB) . In  this  situation,  local 


variance  increases.  This  trend  continues  until  the  pattern 
becomes  very  mottled  and  a peak  in  local  variance  is 
observed  at  6 m (Figure  1C) . While  it  was  originally 
hypothesized  that  local  variance  would  peak  at  the  size  of 
the  elements  in  the  scene,  the  observed  peak  occurs  when  the 
resolution  cells  are  somewhat  smaller  than  the  trees  in  the 
scene.  Close  examination  of  the  size  of  trees  reveals  an 
average  size  of  between  8 and  9 meters,  whereas  local  vari- 
ance peaks  at  about  6 meters.  Thus,  there  is  not  a simple 
relationship  between  local  variance,  spatial  resolution,  and 
the  size  of  elements,  an  explanation  for  the  location  of 
the  peak  in  local  variance  could  not  be  determined  from  this 
image  alone,  but  became  better  understood  after  viewing 
graphs  from  different  environments  and  after  the  image  simu- 
lation phase  of  the  project. 

As  the  resolution  increases  past  this  peak,  local  vari- 
ance decreases.  This  decrease  occurs  as  individual  pixels 
come  to  include  a mixture  of  both  trees  and  background.  As 
this  mixing  increases,  the  general  contrast  in  the  image 
decreases  and  pixels  begin  to  look  more  like  their  neigh- 
bors. Local  variance  thus  continues  to  decrease  (Figures 
1D-1F). 

There  is  considerable  structure  in  the  contour  plot  of 
the  variogram  derived  from  the  South  Dakota  forest  image 
(Figure  3) . The  strength  of  the  relationship  between  a 
given  pixel  and  its  neighbors  tends  to  decrease  with  dis- 


Figure  3.  Two-dimensional  variogram  of  the  South  Dakota 
forest  scene.  Units  are  pixels  at  original  resolution  (0.75 
m) . 


tance  until  it  reaches  the  sill,  or  the  level  of  no  interac- 
tion, at  about  the  eighth  contour  line.  At  this  distance, 
the  relationship  between  pixels  is  essentially  as  if  they 
were  selected  at  random.  Ideally,  this  portion  of  the  con- 
tour plot  should  be  flat,  but  it  appears  to  have  local  peaks 
and  pits.  This  effect  may  be  due  to  the  fact  that  the  con- 
tour plot  is  derived  from  an  estimated  variogram.  With 
increased  sampling,  this  mottled  appearance  may  be  reduced 
or  even  disappear.  The  zone  of  influence  in  the  variogram 
seems  related  to  the  size  of  the  elements  (trees)  in  the 
scene,  as  the  width  of  the  area  inside  the  sill  approximates 
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twice  the  size  of  a tree  in  the  image. 

Another  notable  feature  of  the  variogram  is  its  aniso- 
tropy, which  is  attributable  to  the  shadowing  in  the  image 
(Figure  1A) . The  variogram  is  markedly  elongated  along  a 
diagonal  from  the  upper  right  corner  to  the  lower  left 
corner,  which  corresponds  to  the  orientation  of  illumina- 
tion. Since  shadows  look  more  like  trees  than  background, 
the  shadow  of  a tree  tends  to  reduce  the  variance  measured 
in  the  direction  of  the  shadow. 

Canoga  Park  Residential  Image 

The  image  of  a housing  development  in  Canoga  Park,  Cal- 
ifornia, was  obtained  through  NASA  Ames  Research  Center 
(Figure  4A) . The  data  were  collected  by  multispectral 
scanner;  the  red  band  was  used  for  this  analysis.  This 
scene  presents  an  interesting  change  from  the  forest 
environment  in  that  it  is  a complex  scene,  having  several 
kinds  of  elements.  Associated  with  the  complex  nature  of 
the  scene  is  a change  in  the  way  the  scene  is  organized.  In 
this  environment  there  is  not  a well-developed  background 
similar  to  the  forest  environment.  Instead,  there  are 
several  different  kinds  of  elements  that  are  arranged  in  a 
mosaic  to  comprise  the  scene.  The  most  obvious  elements  in 
the  real  scene  are  houses,  trees,  streets,  lawns,  and  cars. 
However,  close  examination  of  a blowup  of  the  image  (Figure 
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Figure  4.  Red  band  of  multispectral  scanner  image  of  a 
housing  tract  in  Canoga  Park,  CA.  (A)  Portion  of  image; 
enlargement  showing  detailed  structure. 


4B)  reveals  three  kinds  of  elements:  houses  (actually  their 


roofs) , streets,  and  vegetation  or  very  dark  areas.  Vegeta 
tion  covers  most  of  the  spaces  between  the  houses  and  the 
streets.  While  it  is  undoubtably  composed  of  many  types  of 
plants  with  different  life-forms,  they  all  appear  very  dark 


in  the  image  and  cannot  be  differentiated.  Due  to  their 
dark  appearance,  shadows  can  not  be  distinguished  from  the 
vegetation  either,  contributing  to  a description  of  the 
scene  using  three  elements. 


Figure  5 shows  the  graph  of  local  variance  as  a func 


tion  of  resolution  for  this  image.  This  graph  is  similar  in 
appearance  to  the  graph  for  the  forest  scene  in  that  the 
local  variance  is  low  at  the  original  resolution  of  the 
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SPATIAL  RESOLUTION  (motors} 


Figure  5.  Local  variance  as  a function  of  resolution  cell 
size  for  the  Canoga  Park  residential  image. 

datar  rises  to  a peakr  and  then  decreases.  However,  the 
general  shape  of  the  curve  is  different  and  the  resolution 
where  local  variance  peaks  is  different.  The  shape  of  the 
curve  connecting  the  points  on  the  graph  is  broader,  not 
having  as  sharp  or  as  well-defined  a peak.  The  broad  nature 
of  the  graph  is  probably  attributable  to  the  complex  nature 
of  the  scene,  with  different  elements  being  of  different 
sizes.  The  broad  distribution  of  sizes  results  in  high 
local  variance  over  a wider  range  of  spatial  resolutions. 

The  peak  in  local  variance  occurs  at  about  13  - 15  m,  or  at 
five  or  six  times  the  original  resolution  of  the  imagery. 

The  general  size  of  the  elements  again  is  larger  than  the 
spatial  resolution  where  local  variance  peaks.  The  average 
size  of  houses  is  approximately  14  pixels  in  the  original 
image,  while  the  streets  are  approximately  11  pixels  wide 
and  the  spaces  between  houses  and  between  houses  and  the 
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streets  averages  about  6-8  pixels. 


The  variogram  of  the  Canoga  Park  image  is  nearly  circu* 
lar  in  the  zone  of  influence  (Figure  6).  This  isotropy 
indicates  that  there  are  not  any  well-defined  directional 
effects  in  the  image.  One  can  see  in  Figure  4 that  the 
roads  run  in  several  directions  in  the  scene.  If  one  or 
more  directions  predominated,  there  could  easily  be  aniso- 
tropy in  such  an  image  — reduced  variance  in  the  direction 
the  roads  are  oriented.  Also,  there  could  be  effects 
related  to  the  shapes  of  houses  that  might  be  recovered 


Figure  6.  Two-dimensional  variogram  of  Canoga  Park  residen 
tial  scene.  Units  are  pixels  at  original  resolution  (30  m) 
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through  anisotropies  in  the  variogram. 

The  sill  of  the  variogram  probably  falls  somewhere 
between  the  fifth  and  the  sixth  contour  lines,  although  an 
attempt  was  not  made  to  formally  define  the  sill.  The  fifth 
contour  line  is  the  last  one  to  hold  definite  structure  and 
lies  about  8 or  9 pixels  from  the  center,  indicating  the 
existence  of  elements  in  the  image  that  are  at  least  that 
large.  This  size  acrees  reasonably  well  with  the  sizes  of 
objects  in  the  scene. 


I Agricultural  Image 

The  imagery  used  for  the  computation  of  data  for  the 
local  variance  function  were  scanned  at  a resolution  of 
approximately  0.15  m (Figure  7A).  Although  they  are  not 
shown  in  the  figure,  the  image  includes  portion  of  two  other 
fields.  Such  fine  resolution  was  used  because  it  was 
hypothesized  that  an  individual  agricultural  field  could  be 
characterized  as  being  composed  of  elements  such  as  indivi- 
dual plants,  crop  rows,  and  a background  of  soil.  In  this 
formulation,  if  the  resolution  cells  were  smaller  than  indi- 
vidual plants,  or  the  width  of  a crop  row,  then  the  initial 
local  variance  would  be  low.  As  spatial  resolution 
increased  to  approximately  the  size  of  the  elements,  an 
increase  in  local  variance  would  be  expected,  similar  to  the 
findings  in  the  forest  and  residential  environments*,  Local 
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Figure  7,  Agricultural  image  showing  row  structure. 
Portion  of  image;  (B)  detail. 


variance  would  then  be  expected  to  decline  as  spatial  reso- 
lution increased. 

The  observed  results  (Figure  8)  do  not  follow  the 
hypothesized  form  because  the  imagery  was  not  scanned  at  a 
resolution  fine  enough  for  individual  elements  (rows,  sha- 
dows, and  furrows)  to  be  characterized  by  many  pixels. 
Instead,  the  graph  begins  with  local  variance  already  high. 
The  distance  between  crop  rows  is  approximately  5 resolution 
cells  at  the  original  resolution  of  0.15  m.  In  those  five 
pixels  are  included  the  well-illuminated  portion  of  the  crop 
row,  the  shaded  side  of  the  crop  row,  and  the  soil  furrow 
between  the  rows.  As  a result,  very  few  3X3  windows  in 
the  image  will  have  low  variance.  This  problem  can  be  seen 
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Figure  8.  Local  variance  as  a function  of  resolution  cell 
size  for  the  row-crop  image. 

in  a blowup  of  a portion  of  the  image  shown  in  Figure  7B. 

If  resolution  were  considerably  finer,  variance  within  both 
the  shaded  and  well-illuminated  portions  of  a single  crop 
row  would  be  low.  A spatial  resolution  on  the  order  of  5 cm 
would  be  required  for  this  effect  to  be  observed.  Another 
factor  that  may  contribute  to  the  lack  of  initial  low  vari- 
ance is  that  the  crop  is  in  a mature  stage,  and  the  crop 
rows  have  grown  close  together.  Thus,  there  is  not  a well- 
developed  background  signal  between  rows,  against  which  the 
crop  rows  would  be  highly  contrasting. 

Another  noteworthy  feature  of  the  local  variance  graph 
is  the  rapid  decline  past  the  peak  to  a very  low  level. 

This  feature  is  the  result  of  a scene  that  becomes  very 
homogeneous  once  the  resolution  cells  are  larger  than  the 


crop  rows. 

Variograms  were  computed  for  two  different  agricultural 
fields  in  the  image  and  then  the  entire  image  as  a whole. 
These  variograms  exhibit  considerable  structure  related  to 
the  orientation  and  spacing  of  the  rows.  Figure  9 shows  the 
variogram  of  a portion  of  the  field  shown  in  Figure  7.  From 
the  variogram  it  is  easy  to  determine  both  the  direction  of 
the  rows  and  their  spacing.  The  crop  rows  are  oriented  hor- 
izontally in  this  portion  of  the  image,  as  can  be  seen  by 
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Figure  9.  Two-dimensional  variogram  of  a portion  of  the 
agricultural  field  shown  in  Figure  7.  Units  are  pixels  at 
original  resolution  cell  size  (15  cm). 
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the  low  variance  associated  with  horizontal  movement  in  the 
image.  Variance  changes  sharply  with  movement  across  the 
rows,  with  variance  increasing  up  to  one  half  of  the  dis- 
tance between  rows.  Prom  that  point,  variance  decreases, 
until  a minimum  is  reached  at  the  distance  between  rows. 

This  cycle  of  high  variance  at  half-widths  and  low  variance 
at  even  multiples  of  the  distance  between  rows  is  repeated 
all  the  way  to  the  edges  of  the  variogram,  and  would  con- 
tinue if  the  variogram  had  been  calculated  for  a larger  win- 
dow size.  It  obviously  arises  from  the  repetitive  pattern 
in  the  image  itself  produced  by  the  row  structure.  The 
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Figure  10.  Variogram  of  a portion  of  the  agricultural  scene 
showing  "vertical 11  row  structure. 


distance  between  rows  can  be  determined  by  counting  the 
number  of  pixels  between  the  ridges  or  valleys  in  the 
variogram. 

For  the  field  in  the  lower  left  portion  of  the  image, 
the  variogram  (Figure  10)  exhibits  structure  similar  to  the 
previous  variogram  except  that  the  row  direction  is  rotated 
90  degrees.  The  same  pattern  of  ridges  and  valleys  occurs 
at  the  same  spacing  between  rows.  The  pattern  in  the 
variogram  for  the  entire  agricultural  image  (Figure  11)  is 
easier  to  understand  after  looking  at  the  variograms  for  the 
individual  fields.  The  variograms  for  the  entire  image  can 


Figure  11.  Variogram  for  entire  agricultural  image  showing 
combined  effects  of  orthogonal  rows. 
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be  thought  of  as  the  result  of  superimposing  the  variograms 
from  fields  with  rows  oriented  perpendicularly. 

The  variograms  for  these  agricultural  fields  illustrate 
the  strength  of  the  methods  to  illustrate  underlying  struc- 
ture in  images.  Figure  7A  shows  a large  portion  of  the 
lower  left  agricultural  field  used  to  estimate  the  variogram 
in  Figure  9,  and  the  linear  structure  can  be  easily  seen  in 
this  picture.  However,  Figure  7B  is  a blowup  of  a portion 
of  this  image  and  illustrates  how  noisy  this  linear  struc- 
ture is.  The  variogram  estimated  from  the  image  clearly 
identifies  the  linear  structure  despite  the  large  noise  com- 
ponent in  the  image.  These  results  suggest  the  similarity 
between  the  variogram  and  spectral  analysis,  which  is 
another  method  of  finding  periodicities  in  data. 

Thematic  Mapper  Agricultural  Image 

The  image  used  to  analyze  spatial  patterns  in  an  agri- 
cultural scene  at  coarse  resolution  is  a Thematic  Mapper 
image  (Band  3)  obtained  from  Johnson  Space  Center  (Figure 
12) . This  image  is  from  the  area  near  the  corners  of  Mis- 
souri, Louisiana,  Kentucky,  and  Tennessee,  and  the  subimage 
selected  is  west  of  the  Mississippi  River.  This  area  is 
ideal  for  this  project  because  the  scene  is  composed  almost 
entirely  of  agricultural  fields.  In  addition,  many  of  the 
fields  are  planted  in  different  crops  or  are  at  different 
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Figure  12.  Midwestern  agricultural  scene  imaged  by  Thematic 
Mapper  (A) . (B)  Detail. 


stages  of  development,  and  thus  are  contrasting  in  appear 


The  graph  of  local  variance  as  a function  of  resolution 


has  a similar  form  to  the  fine  resolution  forest  scene 


although  it  covers  a different  range  of  spatial  resolutions 


(Figure  13).  Local  variance  starts  reasonably  low  at  the 
original  resolution  of  the  data  (30  m) , but  increases  to  a 


broad,  general  peak  at  about  240  m,  and  then  begins  a gra 


dual  decline.  This  shape  indicates  that  the  elements  in  the 
scene  are  larger  than  the  resolution  cells  of  the  original 
data.  In  this  scene,  there  are  a variety  of  field  sizes  and 
shapes,  but  the  most  common  field  size  is  a quarter-section, 
which  is  14  resolution  cells  on  a side  at  the  original  reso- 
lution of  the  data  (Figure  12B) . Thus,  the  peak  occurs 
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Figure  13.  Local  variance  as  a function  of  resolution  cell 
size  for  the  Thematic  Mapper  agricultural  image. 

before  the  size  of  the  elements  in  this  scene,  similar  to 
the  findings  with  the  fine  resolution  forest  image. 

The  variogram  derived  from  this  image  is  shown  in  Fig- 
ure 14.  The  variogram  generally  decreases  as  a function  of 
distance,  reflecting  the  homogeneity  within  fields.  The 
size  of  the  zone  of  influence  is  related  to  the  size  of  the 
fields  in  the  image,  and  there  is  a slight  anisotropy  in  the 
variogram.  This  anisotropy  is  related  to  the  general  trend 
toward  rectangular  fields  in  the  image  that  are  longer  in 
the  north-south  direction  than  the  east-west  direction. 

This  characteristic  can  be  detected  in  Figure  12A. 

Washington,  D.  C.  Thematic  Mapper  Image 

The  area  used  for  the  calculation  of  the  local 
variance/resolution  graph  and  the  variogram  in  this  TM  image 
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Figure  14,  Two-dimensional  variogram  for  Thematic  Mapper 
midwestern  agricultural  image.  Units  are  pixels  at  original 
resolution  (30  m) . 


is  taken  from  the  city  of  Washington  (Figure  15).  Thus,  it 
is  not  as  simple  or  as  well-defined  a scene  as  the  previous 
examples.  The  graph  of  local  variance  as  a function  of  spa- 
tial resolution  (Figure  16)  does  not  have  the  familiar 
structure  of  initial  low  values,  a peak,  and  eventual 
decline.  Instead,  there  is  a general  decline  in  the  local 
variance  over  the  range  of  spatial  resolutions  covered  by 
the  graph.  This  indicates  that  the  elements  in  the  scene 
are  generally  smaller  than  the  original  resolution  of  the 
data.  There  are  some  multipixel  objects  in  the  image,  as 


Tv  ::x 


450 


ORIGINAL  PAGE  (S 
OF  POOR  QUALITY 


A B 

Figure  15.  Portion  of  Thematic  Mapper  image  of  Washington, 
D.  C.  (A)  with  detail  (B) . (Images  are  reversed  left-for- 
right. ) 


SPATIAL  RESOLUTION  (meters) 


Figure  16.  Local  variance  as  a function  of  spatial  resolu- 
tion for  the  Thematic  Mapper  image  of  Washington,  D.  C. 


can  be  seen  in  Figure  15B.  These  blocks  of  bright  pixels 
are  large  buildings  and  may  help  explain  the  flat  beginning 
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of  the  graph  in  Figure  16.  Other  than  these  large  build- 
ings, there  are  few  homogeneous  regions  in  the  image.  In 
particular,  the  residential  area  of  southeast  Washington  in 
the  left  part  of  Figure  15  has  a mottled  and  random  appear- 
ance. 

The  variogram  derived  from  the  Washington  D.C.  image 
does  not  exhibit  any  structure  that  is  particularly 
interesting  (Figure  17).  The  small  size  of  the  zone  of 
influence  indicates  the  relatively  small  size  of  the  few 


Figure  17.  Two-dimensional  variogram  of  Thematic  Mapper  im- 
age of  Washington,  D.  C.  Units  are  pixels  at  original  reso- 
lution (30  m) . 


elements  that  are  larger  than  the  resolution  cells  in  this 
scene.  There  is  not  an  obvious  explanation  for  the  complex 
shape  defined  by  the  fifth  contour  line.  There  are  not  any 
features  that  have  a similar  shape  or  whose  combined  orien- 
tations would  produce  this  pattern.  These  variations  are 
most  likely  due  to  random  effects.  However , it  is  possible 
that  they  represent  subtle  characteristics  of  the  image. 

Klamath  Forest  Thematic  Mapper  Simulator  Image 

As  an  example  of  a forested  scene  at  30-meter  resolu- 
tion, a Thematic  Mapper  Simulator  image  of  a portion  of  the 
Klamath  National  Forest  was  obtained  from  Ames  Research 
Center.  (Unfortunately,  prints  of  this  digital  image  were 
not  available  at  the  time  of  preparation  of  this 
manuscript.)  The  results  of  the  two  methods  of  measuring 
spatial  pattern  for  this  image  are  similar  to  the  results 
for  the  Washington  D.C.  image.  The  graph  of  local  variance 
as  a function  of  spatial  resolution  shows  a marked  decline 
as  resolution  increases  (Figure  18) . The  results  indicate 
that  there  are  not  any  spatially  homogeneous  elements  in  the 
image  that  are  composed  of  many  trees,  initially  it  was 
expected  that  stands  of  trees,  which  can  be  identified  by 
human  interpreters,  might  cause  a second  peak  in  local  vari- 
ance at  a resolution  related  to  the  size  of  the  stands. 
However,  such  a peak  did  not  occur  for  this  image. 
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SPATIAL  RESOLUTION  (maters) 

Figure  18.  Local  variance  as  a function  of  resolution  cell 
size  for  a Thematic  Mapper  simulator  image  of  a portion  of 
the  Klamath  National  Forest, 

The  drafting  of  the  variogram  for  this  image  was  not 
completed  in  time  for  this  publication.  However,  it  looks 
very  similar  to  the  results  for  the  Washington  D.C.  image, 
exhibiting  a small  zone  of  influence.  In  addition,  there 
are  no  well-defined  anisotropies  that  reveal  any  directional 
orientations  to  the  elements  in  the  scene.  The  variogram 
thus  confirms  the  conclusion  that  there  are  no  large,  spa- 
tially homogeneous  elements  obvious  within  the  image. 

Image  Simulation 

The  results  presented  in  the  last  section  are  inter- 


preted in  an  intuitive  manner.  The  empirical  nature  of 
remotely  sensed  images  makes  it  difficult  to  control  scene 


parameters  in  a way  that  allows  experimentation  to  verify  or 
help  clarify  the  meaning  of  the  results.  Thus,  it  became 
important  to  develop  a method  of  acquiring  images  from  scene 
with  known  characteristics.  One  way  to  approach  this  prob- 
lem is  to  simulate  ground  scenes  and  model  their  reflectance 
characteristics  to  produce  an  image.  This  approach  has 
several  advantages.  First,  it  allows  complete  control,  and 
thus  knowledge,  of  the  ground  scene.  Second,  simulation 
allows  examination  of  simple  scenes,  which  is  important  for 
developing  a solid  foundation  in  such  an  exploratory  and 
empirical  study. 

Forest  scenes  were  selected  to  serve  as  the  basis  of 
the  image  simulation  phase  of  the  project.  Forests  were 
selected  for  several  reasons.  Past  remote  sensing  research 
experience  in  forestry  directly  contributed  to  the  develop- 
ment of  the  ideas  for  this  project.  Also,  forests  present  a 
simple  scene  model  that  can  be  simulated  relatively  easily. 
And,  as  part  of  another  line  of  research  by  Strahler  and  Li 
[4] , a simulation  program  existed  that  could  be  modified  for 
the  purposes  of  this  project. 

The  image  simulation  procedure  is  based  on  Monte  Carlo 
methods  and  uses  a two-resolution  concept.  The  first  level 
of  resolution  is  the  size  of  the  units  in  the  ground  scene 
at  which  elements  are  differentiated.  The  second  is  the 
level  of  aggregation  used  to  simulate  the  image.  For  the 
simulated  image  used  in  this  project,  these  two  resolutions 
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were  the  same,  which  means  that  each  resolution  cell  was 
assigned  to  a single  type  of  element  (crown,  shadowed  crown, 
understory,  or  shadowed  unde r story ) . This  approach  produced 
images  similar  to  the  fine-resolution  imagery  that  was 
scanned  from  aerial  photography. 

The  ground  scene  is  modeled  as  trees  on  a plane.  The 
trees  are  conical  in  shape  with  a known  apex  angle  and  a 
lognormal  height  distribution.  The  apex  angle  used  is  based 
on  the  results  of  field  measurements.  The  use  of  a lognor- 
mal height  distribution  was  selected  on  the  basis  of  the 
results  presented  in  the  ecological  literature.  Field  meas- 
urements of  tree  heights  have  confirmed  the  lognormal  shape 
of  the  distribution,  but  have  illustrated  the  variability  of 
the  means  and  variances  characterizing  the  distributions. 

The  trees  (or  cones)  are  distributed  randomly  on  the 
surface  with  one  exception  - — the  center  of  a new  tree  is 
not  located  within  the  cone  of  a previously  located  tree. 
This  modification  to  the  random  model  was  based  on  the 
expectation  that  competition  between  trees  would  result  in 
the  likelihood  of  finding  trees  very  close  together  being 
lower  than  the  random  model  would  produce  [5J.  Subsequent 
field  measurements  have  not  supported  this  hypothesis,  and 
have  indicated  that  the  simple  random  model  is  a good 
approximation  to  spacing  in  conifer  stands. 

Following  the  random  location  of  trees  and  the  lognor- 
mal assignment  of  their  heights,  an  elevation  map  is  created 


with  the  heights  of  the  surface  above  the  base  level 
representing  the  height  of  the  forest  canopy.  By  specifying 
a direction  of  illumination  and  a solar  zenith  angle,  sha- 
dows are  produced.  The  result  is  the  definition  of  four 
kinds  of  surfaces  in  the  scene:  trees,  background,  shadowed 
trees,  and  shadowed  background.  From  these,  a digital  image 
can  be  synthesized  that  resembles  an  image  drawn  from  a real 
scene  (Figure  19). 

The  results  of  the  graph  of  local  variance  as  a func- 
tion of  resolution  for  a simulated  forest  image  are  shown  in 
Figure  20.  The  size  of  the  image  that  was  simulated  limits 
the  number  of  times  that  spatial  resolution  can  be  degraded. 
Thus,  it  was  not  possible  to  evaluate  local  variance  for  the 
full  range  of  resolutions  used  in  the  scanned  forest  image. 
However,  the  shape  of  the  curve  is  very  similar  to  the 
results  for  the  scanned  forest  image  (Figure  2).  There  is  a 
prominent  peak  at  6 m and  then  a decline  in  local  variance. 

The  diameter  of  the  tree  crowns  in  the  simulated  image 
has  a mean  of  7 m and  a very  low  variance  (approximately  0,5 
m) . However,  because  the  shadows  look  more  like  the  trees 
than  the  background,  the  effect  of  shadows  is  to  make  the 
trees  appear  elongated  in  the  direction  opposite  the  illumi- 
nation source.  If  shadows  were  considered  part  of  a single 
dark  element  with  trees,  then  their  size  along  one  axis 
would  be  11  m.  Thus,  the  peak  in  local  variance  occurs  at  a 
size  somewhat  lower  that  that  of  the  elements  in  the  scene, 
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Figure  19.  Portions  of  simulated  forest  images.  On  left, 
simulated  image.  On  right,  texture  image  derived  from  it. 
(A)  Image  as  simulated  at  1-meter  resolution.  Other  photos 
show  image  degraded  to  3-meter  cells  (B) ; 6-meter  cells  (C) ; 
and  9-meter  cells  (D) . 
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Figure  20.  Local  variance  as  a function  of  resolution  cell 
size  for  the  simulated  forest  image. 

similar  to  the  results  obtained  for  the  forest  image. 

The  reason  that  local  variance  peaks  for  this  image 
before  the  size  of  the  elements  is  reached  can  be  better 
understood  by  examining  the  changes  in  both  the  simulated 
image  and  the  texture  image  derived  from  it  as  the  image  is 
degraded  to  coarser  resolutions.  To  display  this  process , a 
series  of  pictures  with  portions  of  both  the  simulated  image 
and  its  associated  texture  image  are  placed  side  by  side  in 
Figures  19  A-D.  The  first  picture  (Figure  19A)  shows  the 
simulated  image  at  its  original  resolution.  In  the  texture 
image,  high  local  variance  occurs  primarily  around  the  per- 
imeter of  the  tree  and  its  shadow,  behaving  like  an  edge 
detector.  The  area  inside  the  perimeter  still  has  rela- 
tively low  local  variances  and  the  area  between  trees  is 
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black,  as  the  background  has  the  same  value  in  all  loca- 
tions. 

Figure  19B  shows  the  results  after  the  image  has  been 
degraded  to  3 m.  At  this  resolution  the  trees  can  not  be 
distinguished  from  their  shadows  and  begin  to  appear  out  of 
focus.  The  dark  areas  inside  trees  in  the  texture  image 
have  disappeared  because  the  size  of  the  trees  in  terms  of 
number  of  pixels  has  decreased.  Similarly,  the  distances 
between  trees  shrinks  and  local  variance  becomes  increas- 
ingly influenced  by  effects  from  neighboring  trees.  Com- 
parison with  the  first  texture  image  (Figure  19A)  shows  a 
larger  area  covered  by  bright  values,  indicating  high  local 
variance. 

The  resolution  of  peak  local  variance  (6m)  is  shown  in 
Figure  19C.  At  th&t  resolution,  trees  have  become  very 
small,  and  a large  area  of  the  texture  image  is  bright, 
indicating  high  local  variance.  An  interesting  characteris- 
tic, which  becomes  very  important,  is  that  there  are  a con- 
siderable number  of  pixels  with  intermediate  values  in  the 
texture  image.  In  the  previous  texture  images,  pixels  were 
either  near  edges  and  very  bright,  or  in  homogeneous  areas 
and  very  dark.  These  intermediate  texture  values  are  the 
result  of  the  effect  of  the  degradation  of  resolution  on  the 
appearance  of  trees. 

At  a resolution  of  9 m,  local  variance  has  begun  to 
decline  (Figure  19D) . The  texture  image  has  begun  to  look 
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like  a continuous  tone  image,  quite  different  from  the  edge 
detector  in  Figure  19A.  While  a greater  proportion  of  the 
texture  image  has  values  other  than  black,  the  mean  value  of 
the  image  is  lower.  This  observation  is  the  key  to  under- 
standing the  reason  that  local  variance  peaks  before  the 
size  of  the  elements.  As  the  imagery  is  degraded,  the  model 
for  the  appearance  of  a tree  is  different  than  originally 
expected.  As  the  resolution  cells  become  larger,  trees  tend 
to  look  more  and  more  out  of  focus,  with  many  pixels  being 
composed  of  a mixture  of  both  dark  tree  or  shadow  and  light 
background.  Thus,  as  the  size  of  a tree  is  approached, 
instead  of  having  alternating  light  and  dark  pixels  for  tree 
or  background,  there  are  several  intermediate  tone  pixels. 
The  reflectance  of  any  given  tree  is  spread  through  many 
pixels.  This  effect  can  be  seen  in  Figure  19C  and  19D.  The 
effect  of  numerous  intermediate-tone  pixels  on  the  texture 
image  is  the  production  of  only  a few  high  texture  values. 
The  contrast  between  pixels  in  the  image  is  not  large  enough 
for  high  texture  values. 

When  viewed  from  this  perspective,  the  result  that 
local  variance  peaks  before  the  size  of  the  elements  makes 
sense.  The  sampling  theorem  states  that  a resolution  cell 
half  the  size  of  the  element  would  be  necessary  to  assure 
brightly  contrasting  pixels  in  the  image.  This  perspective, 
combined  with  the  increasing  area  covered  by  texture  values 
that  are  not  black  as  resolution  increases,  produces  a peak 


in  local  variance  at  about  2/3  to  3/4  the  size  of  trees  in 
both  the  simulated  and  scanned  forest  images. 

The  variogram  for  the  simulated  image  is  shown  in  Fig- 
ure 21  and  serves  to  confirm  the  interpretation  of  the  pre- 
vious variograms.  The  sill  occurs  at  about  the  width  of  a 
tree  from  the  center.  Also,  the  anisotropic  shape  is 
related  to  the  orientation  of  illumination,  similar  to  the 
results  for  the  fine  resolution  forest  image.  It  is 
interesting  that  the  var.  -gram  of  the  simulated  image  has 
peaks  and  pits  outside  the  zone  of  influence.  The  signifi- 
cance of  these  features  is  unknown,  but  it  is  possible  that 
they  are  indicative  of  periodicity  induced  by  the  constraint 
placed  on  the  location  of  trees.  Substantiation  of  such  an 


Figure  21.  Two-dimensional  variogram  of  simulated  forest 
scene.  Units  are  pixels  at  highest  resolution  (1  m) . 
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effect  will  reguire  further  testing, 

This  study  shows  that  spatial  variance  in  digital 
images  depends  on  the  nature  of  objects  within  the  scene 
itself r including  their  size,  shape,  and  spacing.  When  the 
resolution,  cell  size  of  an  image  is  sufficiently  smaller 
than  the  objects  that  dominate  the  scene p overall  image  tex- 
ture remains  low,  the  objects  can  be  resolved,  and  the  two- 
dimensional  variogram  will  easily  reveal  their  shape.  As 
resolution  cell  size  increases,  local  variance  will  peak  at 
a resolution  cell  size  near  two-thirds  of  the  size  of  the 
object.  If  the  resolution  is  too  coarse  to  reveal  indivi- 
dual objects,  local  variance  will  never  peak  as  the  image  is 
degraded,  and  the  variogram  will  show  little  structure. 

Figure  22  presents  all  the  local  variance  graphs 
derived  for  real  images  shown  on  a single  graph.  Note  that 
the  abscissa  is  logarithmic?  note  also  that  the  heights  on 
the  y-axxs,  which  measures  local  variance,,  are  dependent  on 
the  contrast  of  each  image  and  are  thus  not  directly  compar- 
able. However,  the  figure  clearly  identifies  the  sensor- 
scene  combinations  for  which  classification  and  clustering 
are  appropriate  (where  resolution  cell  size  is  signif icanlty 
smaller  than  the  objects  in  the  scene)  as  opposed  to  mixture 
modeling  (where  resolution  cell  size  is  significantly  larger 
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Figure  22.  Combined  local  variance  graphs  for  the  scenes 
studied  in  this  paper.  Note  that  the  height  of  the  peaks  is 
arbitrary,  since  it  depends  on  relative  image  brightness. 

than  the  objects  in  the  scene).  Thus,  it  will  be  easy  to 
identify  fields  with  TM  data,  but  urban  scenes  will  require 
a different  approach  than  classification.  SPOT  panchromatic 
data,  at  10-meter  resolution,  will  delineate  urban  objects 
and  forest  trees,  but  will  still  not  reveal  the  periodic 
structure  of  crop  rows  within  agricultural  fields. 

Future  work,  anticipated  for  the  third  year  of  this 
project,  will  involve  the  explicit  formulation  of  variograms 
for  scenes  composed  of  simple  objects  in  regular  and  random 
arrangements.  We  will  also  formulate  the  exact  relationship 
bwtween  parametric  variograms  and  variograms  of  images  in 
which  spatial  averaging  within  picture  elements  occurs. 

From  this  formulation,  we  should  be  able  to  link  the 


vanogram  directly  with  the  graph  of  local  variance  as  a 
function  of  resolution  cell  size.  Clarifying  these  rela- 
tionships should  allow  us  to  come  to  a better  understanding 
of  how  variograms  and  other  spatial  statistics  may  be  used 
in  remote  sensing  for  better  scene  modeling,  image  enhance- 
ment, and  image  understanding. 
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Abstract 

We  examine  the  task  of  matching  images  of  a scene  when  they  are  taken  frpm 
very  different  vantage  points,  when  there  is  considerable  scale  change,  and  when 
the  image  orientations  are  unknown.  We  use  the  linear  structures  in  the  scene 
as  the  basis  of  our  correspondence  procedure.  This  paper  considers  the  problem 
of  describing  the  linear  structures  in  a manner  that  is  invariant  relative  to  thej 
variations  that  can  occur  among  images,  and  discusses  a method  of  finding  the  best 
description  of  the  linear  structures. 


1.  Introduction 


When  the  human  visual  system  is  presented  with  two  views  of  a single  scene,  it 
determines  the  relative  viewing  positions  of  the  two  images  and  brings  the  latter  into 
correspondence.  That  is,  the  relationship  of  each  image  to  the  scene  is  understood 
so  that  both  images  can  be  used  as  information  sources  for  further  processing.  This 
human  ability  functions  well  over  a wide  range  of  viewing  positions  and  conditions. 
It  is  this  ability  to  place  two  very  different  views  of  a single  scene  into  correspondence 
that  we  address  in  this  paper. 

We  should  draw  a distinction  between  two  forms  of  the  image  correspondence 
task.  Traditionally,  image  registration  has  been  a task  undertaken  by  photogram- 
metrists.  One  application  involves  registering  an  image  to  a map  so  that  new  in- 
formation, present  in  the  image,  may  be  transferred  to  the  map.  Another  is  the 
registration  of  the  two  images  of  a stereo  pair  so  that  disparity  information  can  be 
extracted.  In  each  of  these  tasks  the  two  images,  (or,  in  the  first  instance,  the  image 
and  the  map),  are  similar  in  terms  of  both  their  viewing  position  and  their  scale. 
The  techniques  used  for  registering  the  two  images  are  point-based.  A feature  point 
in  one  image  is  matched  to  the  same  feature  point  in  the  other  image.  In  automated 
systems  this  is  achieved  by  selecting  a small  window  about  the  feature  in  one  image 
and  then  correlating  this  window  with  one  in  the  second  image.  If  there  is  little 
distortion  or  occlusion,  this  technique  performs  well;  it  has  become  the  basis  of 
current  automated  image-registration  systems. 

The  research  reported  herein  was  supported  by  the  Defense  Advanced  Research  Projects  Agency 
under  Contract  MDA9G3-83-C-0027  and  by  the  National  Aeronautics  and  Space  Administration 
under  Contract  NASA  9-16664*  These  contracts  are  monitored  by  the  U.S*  Army  Engineer 
Topographic  Laboratory  and  by  the  Texas  A&M  Research  Foundation  for  the  Lyndon  B*  Johnson 
Space  Center* 


The  other  form  of  the  image  correspondence  task  seeks  to  find  the  relationship 
among  views  that  differ  widely  in  vantage  point,  scale,  etc.  We  will  refer  to  this  as 
the  correspondence  task,  and  use  registration  as  the  name  for  the  form  of  the  task 
outlined  above.  In  correspondence  tasks  there  is  significant  distortion  between  the 
images,  the  scale  may  differ  and  may  not  even  be  constant  across  a single  image, 
as  is  the  case  in  oblique  aerial  imagery,  occlusion  is  common,  and  the  response 
of  the  various  sensors  to  a single  feature  differs  greatly.  Feature  point  matching, 
as  used  in  image  registration,  is  prone  to  error.  However,  feature  point  matching 
is  not  the  only  means  of  placing  images  into  correspondence.  It  appears  that  the 
human  visual  system  makes  use  of  nonpoint  features,  such  as  linear  structures  and 
extended  landmarks.  The  aspects  of  our  investigation  reported  here  utilize  the 
linear  structures  of  the  images  as  the  prime  elements  for  achieving  correspondence. 

In  classifying  the  methods  that  could  be  employed  to  find  linear  structures  in 
images,  we  draw  a distinction  between  techniques  that  use  semantic  information 
and  those  that  do  not.  If,  for  example,  we  apply  a road  operator  to  locate  some 
of  the  linear  structures  in  an  image,  that  operator  has  had  built  into  it  semantic 
knowledge  about  the  appearance  of  roads.  We  could  proceed  in  this  manner  and 
build  comparable  operators  for  all  the  scene  objects  that  manifest  themselves  as 
linear  structures  in  images.  Alternatively,  we  could  seek  to  find  the  linear  structures 
in  an  image  without  “identifying”  their  nature.  In  this  case,  we  identify  the  image 
behaviour  interpreted  by  us  as  a linear  structure  without  knowledge  of  the  world 
objects  that  gave  rise  to  that  structure.  We  choose  this  latter  course  because  we 
wish  to  establish  the  correspondence  among  images  without  first  having  to  identify 
the  scene  objects. 
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The  correspondence  task  is  carried  out  in  three  stages:  we  must  find  the  linear 
structures,  we  must  build  their  descriptions  and,  finally,  we  must  match  these 
descriptions.  The  details  of  the  first  stage  is  reported  in  Fischler  and  Wolf  [1]. 
In  this  paper  we  explain  how  those  procedures  are  employed  in  the  correspondence 
task.  We  present  a detailed  account  of  our  implementation  of  the  second  stage  - 
namely,  building  structure  descriptions  - along  with  an  outline  showing  how  these 
descriptions  are  to  be  used  in  the  final  matching  stage. 

2.  Finding  the  Linear  Structures 

Descriptions  of  the  semantically  free  procedures  we  use  to  find  linear  structures 
in  images  can  be  found  in  Fischler  and  Wolf[l).  In  essence,  these  procedures  first 
find  those  pixels  whose  intensity  levels  are  local  maximums  and  minimums,  then 
cluster  such  pixels  and  identify  the  minimal  spanning  tree  for  each  cluster.  The 
long  paths  in  each  of  the  spanning  trees  are  found,  whereupon  these  form  the  basis 
for  the  linear  structure  reported  by  the  procedures.  The  results  of  applying  these 
procedures  are  shown  in  Figures  1-4.  Figure  1 is  a natural-color  oblique  view  of 
the  Eel  river  in  northern  California;  Figure  2 is  a vertical  infrared  view  of  the 
same  scene.  Each  was  scanned  through  red,  green,  and  blue  filters;  the  results  of 
the  procedures  for  finding  linear  structures  in  each  of  these  separation  images  are 
shown  in  Figures  3(a), 3(c), 3(e)  and  4(a), 4(e), 4(e).  In  addition,  the  red,  green,  and 
blue  separation  images  were  combined  into  images  of  hue,  saturation,  and  intensity; 
these  were  also  processed  to  find  the  linear  structures  contained  in  them.  The  results 
are  shown  in  Figures  3(b), 3(d), 3(f)  and  4(b), 4(d), 4(f). 
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Oblique  Natural-Color  Image  of  the  Eel  River 


Figure  2.  Vertical  Infrared  Image  of  the  Eel  River 


These  separation  images  differ  appreciably  in  their  linear  structure.  Certainly 
no  one  separation  image  can  be  selected  as  providing  a complete  delineation  of 
the  river.  The  philosophy  we  adopt  is  to  view  the  original  image  from  as  many 
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Figure  3.  Linear  Structure  in  the  Oblique  Image 
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Figure  4.  Linear  Structure  in  the  Vertical  Image 

perspectives  as  possible,  obtaining  the  linear  structures  as  seen  from  each  of  these. 
That  is,  we  look  for  structures  in  hue,  in  the  green  spectral  band,  and  so  on.  Of 
course,  the  hue  image  is  derived  from  the  red,  green,  and  blue  images,  and  contains 
only  redundant  information,  but  this  presentation  of  the  information  may  show 
structure  that  was  masked  in  other  presentations.  In  this  sense,  the  additional 
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Figure  5*  Linear  Structure  in  the  Composite  Oblique  Image 
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Figure  6.  Linear  Structure  in  the  Composite  Vertical  Image 
perspectives  provide  new  information  on  which  the  linear-structure  finders  can 
act.  The  results  of  combining  the  linear  structures  extracted  in  all  the  various 
perspectives  are  shown  in  Figures  3(h)  and  4(h).  Clearly,  some  of  this  structure 
comes  from  shading  effects  rather  than  from  physical  structure  in  the  scene.  We 
need  to  separate  the  real  physical  structure  from  all  else. 


Figure  7.  Structure  Descriptions 


Figures  3(h)  and  4(h)  were  obtained  by  adding  the  binary  images  produced  by 
the  linear-structure  finders.  Consequently,  in  the  combined  image  the  values  are 
greater  than  one  at  those  pixel  positions  where  linear  structure  was  seen  in  more 
than  one  separation  image.  We  treat  this  combined  produce  as  a new  “grey-level” 
image  and,  once  again,  apply  the  linear-structure  finders.  The  results  obtained  from 
applying  these  procedures  to  Figures  3(h)  and  4(h)  are  depicted  in  Figures  5(b)  and 
6(b).  Figures  5(a)  and  6(a)  show  an  intermediate  result  before  we  cull  short  struc- 
tures. For  each  of  the  structures  in  Figures  5(b)  and  6(b),  we  calculate  the  average 
“intensity”,  that  is  the  average  number  of  original  sepaiation  images  exhibiting  that 
linear  structure.  Figures  5(c), 5(d), 5(e), 5(f), 5(g), 5(h)  and  6(c), 6(d), 6(e), 8(f), 6(g), 6(h) 
re\  eal  which  segments  would  remain  if  we  thresholded  the  “intensity”  values  at  1, 
1.5,  2,  2.5,  3,  and  3.5,  respectively. 

We  build  a description  of  the  linear  structures  from  one  of  these  images.  The 
image  we  use  will  depend  on  the  final  matching  procedure.  If  we  wish  to  attempt 
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to  first  match  the  “strongest”  structures  we  use  the  image  resulting  from  a high 
threshold.  On  the  other  hand,  if  we  wish  to  match  the  complete  structure,  the 
unthresholded  image  would  appear  to  be  more  appropriate.  In  the  next  section, 
where  we  discuss  the  nature  of  the  structure  description,  we  use  as  examples 
the  foregoing  two  extremes.  In  the  case  of  the  oblique  image,  we  have  used  the 
“intensity”  image  at  a threshold  of  3.5  (Figure  7a),  while  for  the  other  case,  the 
vertical  infrared  image,  we  employ  the  unthresholded  image  (Figure  7b). 

3.  Describing  the  Linear  Structures 

The  means  used  to  describe  a linear  structure  is  not  independent  of  the  use  to 
which  this  description  will  be  put.  A description  that  makes  it  possible  to  reproduce 
the  structure  is  different  from  one  that  is  sufficient  to  recognize  it.  As  matching  is 
our  goal,  we  want  a description  that  is  general  enough  to  be  unaffected  by  noise  in 
the  data,  but  specific  enough  to  distinguish  among  structures  that  the  human  visual 
system  would  classify  as  different.  To  the  extent  feasible,  the  description  must  be 
invariant  with  respect  to  the  variations  that  can  occur  in  the  data.  Specifically,  we 
want  the  description  to  hi-  independent  of  orientation,  scale,  and  vantage  point. 

Our  matching  process  will  compare  graphs  of  symbolic  descriptions.  We  will  use 
as  little  metric  information  as  possible.  Consequently,  the  descriptions  we  employ 
are  symbolic  ones,  the  primitive  entities  in  each  of  which  have  qualities  that  are 
themselves  symbolic.  For  example,  a primitive  may  be  a straight-line  segment  whose 
properties,  such  as  an  intersection  angle  (with  some  other  primitive),  have  values 
acute,  near-colinear,  etc.  rather  than  a value  in  degrees. 
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The  primitives  we  have  chosen  to  use  are  straight-line  segments,  arcs  of  circles, 
and  model-less,  that  is,  data  we  prefer  to  describe  as  indescribable,  data  for  which 
the  data  set  itself  is  the  most  apt  description.  The  choice  of  these  few  primitives 
stems  from  the  observation  that  human  description  of  linear  structures  seems  to  be 
based  on  curves  and  straight  lines  - moreover  on  whether  adjoining  curves  curve 
the  same  or  opposite  ways  and  whether  adjoining  pieces  of  the  structure  intersect 
in  particular  ways.  It  is  also  a fact  that  humans  find  certain  parts  of  the  structure 
too  difficult  to  describe,  and  assign  them  some  generic  term  like  “wiggles”. 

Selection  of  the  description  primitives  is  only  half  the  task  of  description 
building.  We  need  to  be  able  to  divide  the  linear  structure  into  parts  and  assign 
a primitive  to  each.  Usually  the  task  of  dividing  the  linear  structure  into  parts 
and  describing  each  of  these  parts  has  been  handled  as  two  relatively  independent 
processes  in  which  partitioning  has  preceded  parts  description.  The  difficulty  with 
this  approach  is  that  some  characterization  of  the  breakpoints  between  parts  has  to 
be  found.  Generally,  this  characterization  is  based  only  on  local  properties  of  the 
linear  structure,  even  though  neighborhood  information  or  local  inhibition  may  be 
employed  so  as  to  benefit  from  more  broadly  based  information.  In  this  respect,  the 
task  of  describing  a structure  in  terms  of  its  primitive  parts  appears  to  have  been 
replaced  by  the  more  difficult  undertaking  of  describing  breakpoints.  Our  concern 
is  to  find  the  “best”  description  without  first  having  to  find  the  “best”  subdivision. 
Furthermore,  we  would  like  “best”  to  be  defined  in  terms  of  a global  criterion  rather 
than  local  properties  of  the  structure. 

The  advantage  of  defining  best  in  terms  of  a local  criterion  is  that  many  can- 
didates for  the  definition  of  “best”  spring  to  mind.  The  disadvantage  of  defining 


“best”  in  a global  sense  is  the  lack  not  only  of  likely  definitions,  but  also  of 
computationally  effective  algorithms  for  finding  this  optimal  solution.  Howevef,  j 
a description  that  views  the  data  from  a “gestalt”  perspective  seems  more  likely  to,! 

be  independent  of  image  orientation,  scale,  and  vantage  point  than  one  that  applies  , 

! 

local  data  measures  to  define  th^  optimal  description.  We  define  best  description,  as 
the  one  that  minimizes  the  number  of  symbols  needed  to  encode  the  linear  structure 
in  terms  of  our  description  primitives, 

! i 

! 

4.  Minimal  Encoding 

The  need  to  match  data  to  description  primitives  ip  a central  aspect  of  decision 
theory  and  pervades  artificial  intelligence  research.  It  is  a human’s  ability  to 
abstract  data  in  terms  of  descriptive  models  that  distinguishes  human  information 
processing  from  its  electronic  namesake.  Effective  data  abstraction  is  a balance 
between  two  competing  requirements.  On  the  one  hand  a descriptive  model  must 
fit  the  data  adequately,  while,  on  the  other,  the  descriptive  model  must  not  be 
needlessly  complex.  The  criterion  we  use  to  select  among  competing  descriptions  is 
based  on  the  work  of  Georgeff  and  Wallace  [2],  in  which  the  description  considered 
“best”  is  the  one  that  can  be  encoded  in  the  fewest  symbols. 

Suppose  we  wish  to  send  data  to  some  receiver  so  that  he  can  recreate  the 
data  to  some  preselected  level  of  resolution.  The  sender  and  receiver  have  agreed 
on  a language  for  this  communication  that  consists  of  a set  of  primitive  elements. 
What  is  the  most  efficient  encoding  of  the  data;  which  message  has  the  minimal 
encoding  length?  Consider  the  example  of  sending  a message  that  describes  a linear 


structure.  The  latter  can  be  thought  of  as  a list  of  x and  y coordinates.  Let 
us  further  suppose  that  the  language  of  communication  contains  three  primitives: 
straight-line-segments,  arcs-of-circles,  and  model-less-segments.  Is  it  more  efficient 
to  send  the  data  as  a single  model-less-segment  primitive,  that  is,  as  a list  of  (x,y) 
coordinates,  or  might  it  be  more  efficient  to  describe  the  data  by  one  or  more  of  the 
other  primitives,  specifying  sufficient  information  to  describe  how  the  actual  data 
differ  from  the  primitives? 

The  message  can  be  viewed  as  a list 


((Mi,  £>i),  (A/2, ),...)  , 

where  M is  the  specification  of  the  primitive,  D the  specification  of  the  data  in 
terms  of  the  selected  primitive  M.  Let  us  consider  an  example.  Suppose  we  have 
a data  set  that  approximates  a straight-line  segment.  We  could  communicate  this 
by  specifying  a straight-line-segment  primitive  M,  where  M consists  of  a code  for 
the  straight-line-segment  primitive  and  parameters  that  specify  the  actual  straight 
line  segment.  These  parameters  might  be  the  endpoints  of  the  line.  We  also  need 
to  specify  the  actual  data  in  terms  of  this  primitive  M.  The  data  specification  D 
might,  for  each  data  point,  specify  its  coordinates  as  a distance  along  the  line  (from 
its  centre)  and  the  perpendicular  distance  from  the  point  to  the  line. 

As  the  expected  distances  from  the  points  to  the  line  are  small,  we  shall  choose 
an  encoding  of  these  distances  so  that  the  more  probable  of  these,  the  smaller 
distances,  are  encoded  in  fewer  symbols  (or  bits)  than  those  that  are  less  likely.  In 
the  actual  examples  we  shall  describe  later,  we  assumed  a Gaussian  distribution  for 
these  perpendicular  distances  and  we  encoded  optimally  for  that  distribution.  The 


optimal  encoding  length  is  just  the  negative  logarithm  of  the  probability,  i.e.,  the 
function  denoted  as  “information”  in  information  theory. 

If  we  have  a‘ small  number  of  data  points  fewer  symbols  may  be  needed  to 
communicate  the  data  as  a list  of  points;  if,  however,  there  is  a large  number  of 
data  points  that  exhibit  behaviour  consistent  with  a primitive,  it  will  probably  be 
cheaper  to  encode  this  data  set  as  the  primitive  and  then  specify  the  data  in  terms 
of  that  primit  ive.  Of  course  we  are  not  just  comparing  the  encoding  of  all  the  data 
with  either  one  primitive  or  another.  It  might  be  more  efficient  to  encode  the  data 
as  a few  primitives,  with  each  primitive  “explaining”  a different  part  of  the  data. 
The  encoding  w'e  select  is  the  one  that  is  globally  best  in  explaining  all  the  data. 

A way  of  viewing  the  message  form  outlined  above,. 

((JWi,0x),(Ms,P2),...) 

is  to  look  upon  M as  the  overhead  of  introducing  another  primitive  while  D 
represents  the  quality  of  the  St  between  the  data  and  the  primitive.  Of  course,  since 
different  primitives  have  different  M’s,  M also  weights  each  primitive’s  efficiency 
for  encoding  data.  In  comparing  message  length  we  are  balancing  the  complexity 
introduced  by  adding  an  extra  primitive  to  the  description  of  the  data  against  the 
quality  of  fit  between  the  assembled  primitives  and  the  data  values. 

Although  the  above  discussion  focused  on  encoding  messages  for  communica- 
tion, we  use  minimal  encoding  length  as  the  criterion  for  finding  the  best  description 
of  a linear  structure  - without  any  interest  on  our  part  in  actually  transmitting  the 
data.  This  of  course  means  that  we  only  have  to  decide  how  many  symbols  would 
be  used  if  we  were  to  encode  the  linear  structure  in  a particular  manner  rather 
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than  actually  doing  the  encoding.  We  can  use  the  results  of  information  theory  to 
determine  the  optimal  encoding  length  without  even  having  to  understand  what 
the  optimal  encoding  scheme  is.  That  is,  information  theory  gives  us  an  operator, 
or  a measure,  that  we  can  apply  to  a description  to  determine  how  many  symbols 
we  would  need  if  we  were  to  encode  it  optimally,  without  any  consideration  of  the 
actual  encoding  scheme  and  without  the  need  to  do  the  encoding. 

Let  us  consider  our  application,  encoding  linear  structures  in  terms  of  three 
primitives:  straight-line-segments,  arcs-of-circles,  amd  model-less-segments.  We  will 
assume  that  the  data  are  specified  on  a NxM  grid,  and  that  the  noise  in  the  data  will 
induce  a Gaussian  distribution  of  the  data  points  around  the  generating  primitive. 
Given  that  all  grid  points  are  equally  likely,  the  cost  in  bits  of  encoding  a grid  point 
is  logN  -t*  logM , ( log  is  to  the  base  2).  Now  consider  the  three  alternative  ways  of 
encoding  r data  points  (using  one  primitive  only). 

Model-less-segments 

We  need  a code  to  specify  that  the  primitive  being  used  is  the  model-less- 
segment..  As  there  are  only  three  primitives,  and  we  assume  that  they  are  all  equally 
likely,  it  costs  logZ  hits  to  specify  the  code.  Specification  of  the  data  in  terms  of 
this  primitive  will  require  in  turn  that  we  specify  r grid  coordinates,  that  is,  a cost 
of  r[logN  + logM)  bits. 

Straight-line-segment: 

We  can  specify  the  straight-line-segment  primitive  by  specifying  the  endpoints 
of  the  line  segment.  This  costs  2 {logN  + logM)  bits.  In  addition,  the  cost  of 
specifying  the  code  for  this  primitive  is  logZ.  To  specify  the  data  in  terms  of 
this  primitive  we  will,  for  each  data  point,  specify  a distance  along  the  line  and 
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the  perpendicular  distance  from  the  point  to  the  line.  If  the  line  segment  is  of 
length  l (in  grid  units)  then,  to  specify  r distances,  if  we  assume  that  all  distances 
are  equally  likely,  will  cost  rlogl  bits.  If  it  is  also  assumed  that  the  data  points 
have  a Gaussian  distribution  about  the  primitive  model,  the  cost  of  specifying  r 
perpendicular  distances  is 


where  d is  the  perpendicular  distance  from  the  point  to  the  line,  and  a the  standard 
deviation  associated  with  the  distribution.  When  the  above  expression  is  expanded, 
the  sum  over  the  d’s  is  just  the  sum  of  the  residuals  squared  that  is  calculated  when 
the  line  is  fitted  to  the  data  by  least-squares  methods. 

Ar  cs-of-  circles: 

We  specify  the  ar  cs-of- circles  primitive  by  specifying  the  endpoints  of  the  arc 
and  one  other  point  on  the  arc.  This  costs  3 [logN  -f*  logM)  bits,  while  the  cost  of 
specifying  the  code  for  this  primitive  is  logS  bits.  To  specify  the  data  we  use  the 
same  scheme  as  we  did  for  the  straight-line-segment  primitive. 

Using  these  costing  functions  and  a search  algorithm  that  examines  the  various 
ways  for  partitioning  a linear  structure  into  primitives,  we  find  the  best  description 
of  that  structure. 
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5.  Results 

The  results  of  using  the  foregoing  procedure  on  some  of  the  linear  segments 
found  in  Figures  1 and  2,  (and  shown  in  Figures  7(a)  and  7(b)),  are  depicted  in  the 
remaining  panes  of  Figure  7.  From  Figures  7(a)  and  7(b)  we  have  selected  some 


linear  structures.  The  selected  structures,  which  form  the  main  course  of  the  Eel 
river,  are  shown  in  Figures  7(c)  and  7(d).  Our  interest  is  in  determining  whether  the 
description  built  from  one  image  is  the  same  as  that  from  the  other.  Of  course,  in 
the  final  version  of  the  structure  builder  we  would  need  to  handle  all  the  segments 
simultaneously,  but  this  will  necessitate  considerable  improvement  in  the  search 
algorithm  to  keep  computational  costs  down  to  a reasonable  level. 

Figures  7(e)  and  7(f)  show  the  primitives  returned.  The  arc  of  circles  are  shown 
as  full  circles  to  improve  readability.  In  Figures  7(g)  and  7(h)  the  primitives  have 
been  overlaid  on  the  data  to  show  the  quality  of  fit.  In  assessing  these  results, 
one  should  keep  the  purpose  of  this  description  in  mind.  We  want  to  extract  a 
description  of  the  linear  structure  in  terms  of  lines  and  curves,  in  terms  of  the 
manner  in  which  parts  intersect  (acute  angles,  near-colinearity,  etc.),  in  terms  of 
relative  curvature  (tight  curves,  gentle  curves,  and  the  like),  and  in  terms  of  the 
sequencing  of  parts  in  the  structure.  Given  that  the  two  images  are  viewed  from  very 
different  vantage  points,  that  the  scale  is  quite  different  (not  even  constant  in  one 
image),  that  one  image  was  taken  in  the  infrared  band  and  one  in  the  visible  band, 
that  the  images  were  taken  one-ahd-a-half  years  apart  during  different  seasons, 
and  that  no  semantic  information  was  used  in  the  processing,  the  closeness  of  the 
resulting  descriptions  is  noteworthy.  This  points  to  the  usefulness  of  processing 
the  data  in  the  above  manner;  namely,  the  method  of  finding  the  linear  structures; 
the  primitives  used  to  encode  the  structure;  and  the  encoding-length  measure  as  a 
criterion  for  best  description. 

Figure  7 shows  the  results  obtained  with  real  data.  Similar  results  have  been 
obtained  in  experiments  that  employ  other  real  data  sets.  Justification  of  the 


Figure  8.  Encoding  of  Synthetic  Data 

method,  however,  requires  further  extensive  experimentation.  To  better  understand 
the  behavior  of  the  description  builder  we  include  an  example  using  synthetic  data. 
The  data  points  are  shown  in  Figures  8(a)  and  8(b).  In  Figure  8(b)  one  extra  data 
point  has  been  added  to  those  shown  in  Figure  8(a).  The  resulting  descriptions  are 
shown  in  Figures  8(c)  and  8(d)  and  overlaid  on  the  data  in  Figures  8(e)  and  8(f). 
The  addition  of  one  critical  point  alters  the  description,  an  effect  not  unknown  in 
the  human  visual  system.  The  resulting  descriptions  seem  to  match  those  perceived 
by  humans  when  they  are  presented  with  Figures  8(a)  and  8(b).  While  we  could  not 
claim  that  minimal  encoding  is  the  criterion  used  by  the  human  visual  system  for 
description  building,  we  note  that  this  criterion  conforms  to  the  type  of  behavior 
we  would  want  to  achieve  if  we  were  modeling  the  visual  system.  Of  course,  if  the 
resultant  description  is  sensitive  to  every  addition  or  deletion  of  a data  point  it  is  of 
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little  use.  In  general,  the  minimal-encoding-length  description  appears  to  be  stable 
with  respect  to  data  changes,  except  when  “critical”  points  are  added  or  deleted. 

6.  Matching  the  Descriptions 

If  the  description  we  obtain  from  the  description  builder  characterizes  the 
data  and  is  invariant  with  respect  to  orientation,  scale,  and  vantage  point,  the 
burden  of  matching  descriptions  is  lightened  considerably.  It  is  our  intent  to  match 
descriptions  at  the  symbolic  level,  to  represent  the  descriptions  found  by  minimal 
encoding  as  graphs  of  symbolic  entities,  and  to  match  those  graphs  on  the  basis 
of  their  structure.  Of  course,  it  is  unlikely  that  the  graphs  derived  from  different 
images  will  match  perfectly.  Nevertheless,  from  a prospective  match  we  can  find 
correspondences  in  the  original  images,  and  calculate  the  camera  transformation 
between  the  images. 

This  procedure  allows  data  in  one  image  to  be  transformed  into  the  other.  It 
means  that  we  can  transform  a linear  structure  found  in  one  image  into  the  other 
image.  For  those  parts  of  the  graph  where  there  is  a mismatch  we  can  ask  the 
question:  how  would  the  linear  structure  that  is  associated  with  the  mismatch  be 
encoded  if  it  were  first  transformed  into  the  other  image  and  then  encoded?  In  this 
manner  we  can  attempt  to  explain  the  graph  mismatches.  If  we  cannot  explain  the 
mismatches  we  should  consider  another  match  of  the  graphs.  Through  this  process 
of  hypothesis  and  verification,  we  seek  to  avoid  acceptance  of  a transformation  that 
does  not  explain  “all”  the  data. 
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7.  Conclusion 

Having  found  the  linear  structures  in  an  image,  we  are  faced  with  two  major 
tasks  before  we  can  use  these  structures  to  find  the  correspondence  between  different 
images  of  a scene.  We  need  to  be  able  to  describe  these  structures  in  a way  that  is 
independent  of  the  variations  that  can  occur  between  the  images,  and  we  need  to 
be  able  to  match  these  descriptions  to  find  the  relationship  between  the  images. 

In  considering  structure  description  we  show  that  the  usual  technique  of  divid- 
ing the  structure  into  parts  and  then  describing  the  latter  can  be  replaced  by  a 
procedure  that  finds  the  “best”  description  of  the  data  on  the  basis  of  a global 
view  of  that  data.  This  technique  simultaneously  divides  the  structure  into  parts 
and  describes  them.  “Best*  is  defined  as  the  cheapest  encoding  of  the  data  when 
we  consider  the  trade-off  between  the  quality  of  explanation  of  the  data  and  the 
complexity  of  that  explanation. 

This  approach  produces  a description  of  linear  structures  that  appears  rela- 
tively insensitive  to  the  vantage  point,  scale,  and  orientation  of  the  original  images. 
It  may  prove  to  be  a description  that  enables  easy  matching,  and  hence  an  effective 
approach  to  solving  the  problem  of  image-to-image  correspondence. 
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Abstract 


This  report  describes  work  in  the  area  of  subpixel 
accuracy  in  image  registration  and  edge  detection.  Two 
main  directions  of  research  were  pursued;  edge  detection 
and  matching  based  bn  the  digital  geometry  of  edges  and 
random  field  models  for  probabilistic  analysis  of  registra- 
tion error.  In  the  edge  detection  approach,  error  bounds 
and  error  probabilities  were  computed  using  theoretical 
models*  Algorithms  were  developed  and  tested  on  simulated 
imagery.  The  methods  appear  promising  for  high  accuracy 
edge  position  estimation  and  registration,  though  further 
refinement  of  the  procedures  will  be  required.  Using 
random  field  models,  a statistical  measure  of  the  quality  of 
the  cross  correlation  peak  as  an  estimate  of  the  offset 
between  a sensed  and  a reference  image  was  developed. 
Simulations  were  performed  to  determine  the  validity  of  this 
estimate  with  real  imagery  and  to  study  the  results  of 
interpolating  digital  correlation  functions  to  estimate  the 
translation  offset  to  subpixel  accuracy. 
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Section  0.  Introduction 

Subpixel  accuracy  in  the  registration  of  images  and  the 
location  of  objects  within  images  is  a topic  of  growing 
importance.  Many  users  with  high  accuracy  mensuration  and 
classification  requirements  are  faced  with  the  difficulty 
of  using  high  resolution  monochromatic  imagery  or  lower 
resolution  mul tispectral  imagery  such  as  LANBSAT.  The 
accuracy  requirements  have  driven  many  users  from  the  lower 
resolution  imagery,  but  recently  the  possibility  of  using 
both  types  of  imagery  together  has  been  considered.  An 
additional  interest  in  subpixel  registration  accuracy 
resultB  from  the  need  for  high  accuracy  interband  registra- 
tion to  improve  classification  accuracy. 

This  report  describes  our  continuing  efforts  in  the 
analysis  of  subpixel  registration  accuracy.  Two  main 
directions  of  analysis  have  been  pursued.  The  first 
approach  uses  the  digital  structure  of  straight  edges  in 
imagery'  to  aid  in  the  matching  of  images.  This  work  has 
two  basic  applications.  One  is  registration,  the  major 
interest  of  our  work.  The  second  application  is  to  the 
location  to  subpixel  accuracy  of  structures  in  images.  The 
two  applications  should  be  carefully  distinguished.  Sub- 
pixel registration  accuracy  only  insures  subpixel  alignment 
based  on  control  objects  such  as  points  or  lines.  Subpixel 
feature  detection  is  of  direct  interest  in  features  which 
may  not  even  appear  in  previous  images.  Our  methods  are  of 
particular  interest  in  mensuration  problems  for  new  features 
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such  as  measuring  road  widths  and  building  sizes* 
The  study  on  the  application  of  the  digital  structure 
of  edges  to  registration  accuracy  took  two  major  directions 
in  the  current  study.  First,  we  continued  our  theoretical 
analysis  of  the  structure  of  digital  lines*  One  key  result 
of  this  effort  was  the  development  of  proofs  for  the 
formulas  describing  the  set  of  lines  which  could  give  rise 
to  an  observed  digital  edge.  During  the  first  year  effort, 
we  used  formulas  which  were  published  iDo-Sml  without  proof. 
Efforts  to  obtain  further  details  from  the  authors  were 
unsuccessful.  Due  to  the  key  importance  of  these  formulas 
and  the  difficulty  of  verifying  them,  we  derived  the 
results.  The  derivations  as  well  as  the  formulas  themselves 
are  important  when  we  consider  the  problem  of  digital  edges 
in  which  some  pixels  are  incorrect.  This  work  thus  provides 
a cornerstone  for  the  theoretical  analysis  of  noisy  edges. 
Due  to  the  complexity  of  the  enumeration  problem  for  digital 
lines,  it  is  useful  to  develop  asymptotic  formulas  for  the 
count  of  digital  lines . Asymptotic  results  are  described 
here  together  with  limited  empirical  verification.  These 
results  are  in  turn  used  to  develop  asymptotic  error 
estimates  for  the  accuracy  of  edge  location  given  the 
correct  digital  edge. 

The  second  part  of  our  work  on  the  digital  structure  of 
edges  for  registration  accuracy  involved  the,  computation  of 
average  registration  accuracy  for  various  models.  This 
portion  of  the  study  also  led  to  the  formulation  of  proced- 


ures  for  subpixel  edge  estimation  which  used  the  grey  levels 
along  the  edge  to  estimate  subpixel  accuracy.  One  particu- 
larly promising  approach  led  to  an  average  accuracy  of  well 
under  0.1  pixel  in  a limited  simulation  study.  This 
approach^  which  appears  quite  promising)  is  a natural 
extension  of  the  digital  edge  matching  of  the  first  year  and 
we  are  in  the  process  of  trying  to  extend  the  analysis  to 
cover  this  method. 

The  second  main  approach  taken  in  our  study  is  based  on 
the  correlation  structure  of  imagery.  Using  the  theory  of 
stationary  random  fields,  we  derived  the  probability  of  the 
peak  of  the  cross-correlation  function  between  a sensed  and 
reference  image  being  more  than  a specified  distance  from 
the  true  offset.  Simulations  were  performed  to  determine 
the  validity  of  this  estimate  and  to  determine  the  quality 
of  the  estimation  of  the  offset  using  the  peak  of  a quadric 
surface  fitted  to  the  correlation  function. 

The  results  described  in  this  report  provide  a founda- 
tion for  the  modelling  of  subpixel  accuracy.  In  addition, 
methods  are  developed  which  appear  promising  on  an  experi- 
mental level.  The  theoretical  methods  developed  have  been 
applied  to  simplified  versions  of  the  methods  and  work  is 
currently  underway  to  extend  this  analysis  and  test  the 
methods  more  thoroughly. 
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NOTATION 

x - greatest  integer  <«  x 
x'  - least  integer  >«  x 
iAn  - greatest  common  divisor  of  m and  n 
L(a,b)  - line  joining  points  a and  b 
1^(n)  --Euler  totient  function  - the  number  of 
positive  integers  less  than  or  equal  to 
n which  are  relatively  prime  to  n 
ft(n)  - is  the  Moebius  function  defined  as 
follows i 

fi(l)  - I? 

if  n > 1,  let  n = p”*1,  . , p"** 
prime  decomposition^ of  n.K  Then 
M(n)  = (~1)R  if  aL  - a^  *• 
fi(n)  - 0 otherwise 
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Section  1.  INTRODUCTION  TO  GEOMETRIC  REGISTRATION 
Matching  edges  in  sensed  and  reference  images  can  be 
used  for  registration.  The  degree  to  which  the  position  of 
a real-world  edge,  such  as  a field  boundary,  can  be  located 
in  imagery  depends  heavily  upon  one's  knowledge  of  the 
scene  and  the  sensors.  Edge  detectors  can  be  used  to 
locate  reasonable  candidates  for  edge  points  and  then  an 
edge  can  be  more  precisely  fit  using  these  points.  Alter- 
natively, an  estimate  of  subpixel  edge  location  can  be 
formed  directly  from  the  grey  levels  [Hy  - Da].  Hybrid 
approaches  may  also  be  adapted.  We  study  the  accuracy 
attainable  using  the  first  of  these  approaches,  which  we 
call  the  geometric  accuracy  approach. 

Before  launching  into  a description  of  our  model  for 
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information  to  define  a translation  between  the  sensed  and 
the  reference  image.  In  the  ideal  case,  the  grey  levels  on 
each  side  of  the  edge  are  constant  off  the  edge  pixels  and 
the  edge  pixel  grey  levels  are  a simple  weighted  average  of 
these  two  grey  levels.  If  all  grey  levels  are  possible  and 
the  edge  pixels  are  all  known  then  the  position  of  the  edge 
can  be  exactly  determined.  Such  a situation  is  clearly 
unrealistic  but  it  serves  as  a starting  point  for  approxi- 
mation . 

Most  current  methods  for  attaining  subpixel  accuracy 
employ  some  type  of  interpolation  of  the  correlation 
function.  If  such  a method  is  to  achieve  subpixel  accur- 
acy, the  digital  correlation  function  must  be  able  to 
achieve  pixel  accuracy.  In  our  work,  we  assume  pixel 
accuracy  is  available  either  through  correlation  or  other 
methods.  Thus,  in  the  simple  case  of  a one-dimensional 
shift  any  real  world  point  can  be  determined  to  lie  within 
a 3x1  pixel  strip.  Our  results  can  be  improved  drastically 
if  we  assume  we  know,  from  registration,  we  are  in  the 
correct  pixel,  but  this  is  a highly  unrealistic  assumption. 

The  analysis  described  in  this  paper  pertains  to  the 
problem  of  one-dimensional  translations*  This  is  not  part- 
icularly restrictive  since  the  two-dimensional  problem  can 
be  easily  decomposed  into  one-dimensional  shift  estimates. 
In  the  line  location  estimation  problem,  we  are  trying  to 
locate  a real  world  line  y = mx  + b in  the  image.  A shift 
( Ax,  Ay)  between  real  world  and  image  coordinates  yields  a 


± ■ -X.  tt t'Wwb.T -w h 4 . 


line  y **  m(x  - Ax)  + b + y in  the  image.  This  may  be 
written  as  y * mx  + b + (Ay  - mAx)  which  is  the  original 
line  shifted  only*,  in  the  y direction  and  by  an  amount 
Ay  - mAx.  Our  1-d  estimation  procedures  enable  us  to  esti- 
mate Ay  - mAx.  Given  two  lines , we  can  solve  (possibly 
using  least  squares)  for  Ax  and  Ay  separately.  From  this 
point  oil)  we  will  confine  ourselves  to  1-d  shifts. 

The  models  described  here  assume  a set  of  pixels 
labelled  edge  pixels  are  provided  by  an  edge  detection 
procedure.  Three  cases  can  be  considered.  First,  the  set 
of  edge  pixels  are  exactly  the  digital  edge  corresponding 
to  a line  in  the  real  world.  This  model  is  unduly  restric- 
tive since  an  edge  which  comes  very  near  a pixel  boundary 
can  show  up  in  the  next  pixel  due  to  noise.  Second,  one 
could  consider  a model  in  which  the  set  of  edge  pixels 
given  is  a subset  of  the  digital  edge  corresponding  to  the 
real  edge.  This  approach  is  more  realistic  since  it 

enables  us  to  discard  some  pixels  whose  classification  as 
edge  pixels  is  in  doubt.  Finally,  we  could  give  a model 
in  which  some  pixels  lying  on  the  digitization  of  the  real 
edge  are  given  and  some  incorrect  pixels  are  given. 

For  the  first  model,  in  which  a complete  digital  edge 
is  available,  a tight  upper  bound  for  the  registration 
error  as  a function  of  the  line  parameters  is  given 
(Section  4).  This  allows  us  to  give  some  probabilistic 
error  estimates  for  the  family  of  all  digital  lines 
(Section  7).  This  analysis  provides  the  firm  basis  for 
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the  study  of  the  second  and  third  models,  but  since  our 
results  in  these  cases  are  not  so  complete  yet,  we  leave 
them  to  future  'reports*  We  notice  nevertheless  that  in 
many  applications  one  only  finds  rather  short  digital 
lines,  with  information  only  on  10  to  20  pixels,  and  hence 
occasionally,"  e.g.  if  no  analytic  formulas  are  available, 
it  might  be  perfectly  justified  to  rely  on  computer-aided 
counting  of  possibilities  when  this  counting  is  not  too 
time  consuming. 

As  a first  step  in  our  analysis  we  parametrize  the 
chain  codes  of  digital  lines  (see  section  2 for  definitions) 
by  four  parameters  N,q,p,s  as  proposed  by  [Do-SroJ  and  use 
some  formulae  from  the  same  paper.  Regretfully  the  report 
[Do-1]  in  which  these  formulas  are  proved  is  not 

apparently  available  and  hence  we  supply  our  own  proofs  in 
the  text  (Section  3).  There  is  an  excellent  report  [Ro-We] 
which  seems  to  be  generally  unnoticed,  and  where  there  are 
several  characterizations  of  those  chain  codes  corresponding 
to  digital  lines  among  all  possible  strings  of  0*s  and  l’s. 
We  do  not  use  their' results  explicitly  but  they  seem 
essential  in  the  analytic  study  of  the  second  and  third 
model.  We  point  out  that  both  in  [Do-Sm] , [Ro-We]  as  in 
other  work  in  the  literature,  no  attention  is  paid  to  the 
counting  of  all  digital  lines  of  finite  length.  It  is  not 
enough  to  count  lines  through  the  origin  as  done  in  [Ro-We], 
and  since  our  probabilistic  analyses  require  such  count, 
we  give  an  exact  formula  for  the  number  of  all  lines  in 


section  7,  which  is  not  a straightforward  generalization 
of  the  formula  for  lines  through  the  origin*  We  also 
provide  asymptotic*  bounds  for  the  number  of  lines  of  a 
given  length  as  well  as  provide  grounds  for  the  reasonable 
conjecture  that,  this  number  L(N)  is  of  the  form 

1*00  * »*/«*  + 0<N*logN). 

The  proof  of  this  conjecture  remains  an  open  problem* 


Section  2.  Digital  Straight  Line  Segment  Parameter 
Estimation 

Estimation  of  the  location  parameters  of  a real  world 
edge  giving  rise  to  an  image  edge  is  discussed  in  this 
section*  The  ideas  discussed  are  a summary  of  those  parts 
of  [Do-Sm]  which  are  useful  for  subpixel  registration. 
Their  basic  result  is  a determination  of  all  lines  whose 
digitization  is  a specified  chain  code.  In  later  sections, 
this  set  of  lines  will  be  used  to  derive  error  bounds  on 
registration  accuracy. 

Several  line  digitization  procedures  are  commonly  used 
in  graphics  and  image  processing.  Given  a line  segment  in 
the  upper  right  hand  quadrant  of  the  plane,  with  slope  and 
y-intercept  both  between  0 and  1 and  strictly  less  than  1, 
we  define  its  digitization  as  follows:  To  each  inter- 

section (a,b)  between  the  line  and  a line  x **  a,  a an 
integer,  we  associate  the  pixel  with  lower  left  hand  corner 
(a,  LbJ  J • (see  figure  2.1).  The  chain  code  of  the  sequence 
of  pixels  with  lower  left  hand  coordinates  (0,bQ  ),  Cl* ), 
b)  is  the  sequence  where 

/0  if  [bjj  = b,_£ 

I ” | 

\ 1 otherwise 

The  restrictions  on  the  slope  and  y-intercept  of  the  lines 
under  consideration  are  made  for  simplicity  of  presenta- 
tion. By  symmetry  the  results  can  be  extended  to  remove 


these  conditions 


To  determine  the  lines  with  specified  chain  code,  it 
is  useful  to  have  a parametr izat ion  of  the  set  of  all  chain 
codes  of  digital  line  segments  resulting  from  digitizing 


the  class 
f o 1 loving 
chain  code 


of  lines  specified  above.  In  [Do-Sra]  the 
parametr izat ion  is  given.  A digital  line  segment 
(c  ,...,c  ) is  given  by  a quadruple  of  integers 

1 N 


(N,p,q,s)  . 


N is  the  length  of  the  chain  code,  i.e.,  the  number  of 
O’s  and  lfs.  We  note  that  not  every  string  of  0's  and  lfs 
is  generated  by  a line  segment.  For  a characterization  of 
those  that  are,  see  [W-R]. 
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Figure  2.1  Chain  code  of  a digital  line.  The  digitization 
of  the  dark  diagonal  line  has  pixels  with  lower 
lefthand  vertices  (0,0),  (1,0),  (2,0),  (3,1), 

(4,1),  (5,1).  The  resulting  chain  code  indi- 

cated by  the  arrows  is  00100. 

Next,  q is  defined  to  be  the  Smallest  integer  such  that 

there  exists  an  extension  c , c ,...,  with  c ,c  , c,  «... 

N+l  N-*z  1 1 l ’ 3 

periodic  with  period  q.  Define  p to  be  the  number  of  ones 


in  a period*  The  fourth  parameter,  s,  provides  a normal- 
ization of  the  chain  code  for  one  period.  Geometrically, 
s may  be  interpreted  as  follows.  Any  chain  code  corres- 
ponds to  a line  segment  with  rational  slope.  Among  all 
such  segments,  select  the  slope  p/q  with  pAq  = 1 which  has 
the  minimum  q*  This  q is  the  period.  The  standard  chain 
code  corresponding  to  the  first  period  of  this  chain  code 
is  the  chain  code  of  the  digitization  of  the  first  q pixels 
of  the  line  through  the  origin,  y = (p/q)x.  The  i-th 
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element  c.  , of  this  chain  code  is  given  by 


c.  = U(p/q)J  - [Ci-l)Cp/q)J  , i=l,2,...N 


The  parameter  s,  of  a code  string  of  length  R,  is  defined 
by  the  condition  that  the  standard  code  string  of  p/q 
starts  at  the  (s+l)th  element  of  the  original  chain  code. 

Given  the  parameters  N,q,p,s  of  a codestring,  the  ith 

element  of  the  original  code  string  can  be  obtained  by 
Cj  “ Ui-s)Cp/q)J  “ [(i-s-1)  (p/q)]  , i = 1,2, ...,R 

The  parameters  satisfy  the  constraints  0<=p<-q<=N  and  0<= 
s <“q~ 1 » A point  which  will  be  particularly  important  for 
the  registration  problem  is  that  there  are  constraints  on 
the  parameters  other  than  the  above  inequalities.  These 
additional  constraints  are  described  in  Section  3.  Our 

interest  in  these  matters  stems  from  the  need  to  enumerate 
the  digital  lines  satisfying  various  conditions.  If  it 
were  not  for  these  messy  constraints,  the  enumeration 
problems  would  often  be  straightforward.  Without  these 
additional  constraints  for  fixed  N,  we  would  obtain  all 
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digital  line  segments  of  length  N by  independently  varying 
s,p,q  subject  to  the  constraints  0<t=Ip<=q<=N  and  0<!=s<=!q-l. 

We  now  give  an  example  of  the  computation  of  the 
parameters  for  a chain  code. 


EXAMPLE:  Chain  code  10010100 

N » 8:  there  are  8 digits  in  the  code 

q ~ 5:  the  above  code  is  part  of  the 

infinite  code 

. . . 100101001010010  . . . 

p = 2:  the  number  of  l’s  in  the  period 

10010  is  2 

s=l:  the  standard  codestring  of  2/5  is 

00101.  The  standard  codestring 

starts  at  the  2nd  element  of  the 
chain  code.  Hence  s = 1. 

Since  the  smallest  period  plays  an  important  role,  let 

us  point  out  two  ways  of  computing  it.  The  first  one  might 

be  easier  to  use  for  long  strings  with  the  help  of  the  FFT, 

the  second  one  is  very  convenient  for  direct  computation  in 

short  strings  . 

For  the  first  algorithm  extend  the  chain  code  to  the 

right,  with  period  N,  i.e.  c.  = c*  . Then 

KN  l 

N £ . + c,  . - 

q = inf  { j : l<  = j<=N  such  that  1/N  £ (-1)  1 t+J  = l|  . 

i = l 

Notice  that  the  maximum  value  of  the  average  in  the  defin- 
ition of  q is  precisely  1.  In  the  second  algorithm,  we 
extend  the  code  chain  in  both  directions  by  zeroes  and 
consider 

q - inf  j:{l<=j<=N  such  that  l/(N-j)  £ (-1)  **J  =1 

i = l 

with  the  understanding  that  if  the  set  of  j's  is  empty  we 
take  q = N.  What  this  really  means  is  that  we  slide 
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successively  to  the  right  the  chain  code  and  compare  the 
tail  end  of  the  original  chain  code  with  the  first  portion 
of  the  shifted  chain  code,  the  value  q corresponds  to  the 
first  perfect  match,  if  there  are  no  matches  then  q = N. 

The  primary  result  of  [Do-Stn]  is  a description  of  the 
set  of  all  lines  whose  digitization  over  the  interval  [0,N] 
is  a set  of  pixels  specified  by  a chain  code.  This  result 
is  of  great  importance  for  our  registration  accuracy 
results  since  it  provides  a hold  on  the  errors  which  may 
arise  by  approximating  the  true  edge  by  a feasible  edge. 
The  set  of  lines  is  described  by  a quadrilateral  in  the 
(e,  n )-plane  where  e is  the  y-intercept  of  a line  and  n is 
the  slope.  We  will  call  this  plane  the  dual  plane.  The 
proof  of  the  following  formulas  will  be  found  in  the  next 
section. 


Define  functions  F and  L by: 


<2) 

F(s) 

* s - [s/qjq 

and 

(3) 

L ( s ) 

= s + [(N  - s 

) /qj  q 

Let 

£ be 

defined  by  the 

equation 

* 

* 

(4) 

1 

+ U(p/q)J 

. £(p/q) 

l/q 

and  0 < i < q , 

or , 

what 

is  the  same 

, by  the 

fact  that  £ 

p zr  - 1 (mod  q)  * 

The 

set 

of  feasible 

lines  is 

a convex 

quadrilateral  in 

Ce,«)-space  with  vertices  A,  B,  C,  D given  by 


(5) 

A = ( [f(s  )p/qj 

- F ( s ) p+  / q+  » p+  / q+  ) 

(6) 

B = (fF(s)p/q} 

- F(s)p/q,  p/q) 

(7) 

C = (1  + 

[F(s  + £ )p/qj  “ F(s  + £ )p/q,p/q) 

iis.' 


• .b 


L .-If 


• : 
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: I 
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(8)  D 


(1  + ^F(s  + i )p/qj  - F(s  + t )p  / q , p"  / q_  ) 


[ 

E 

[ 

C 

[ 

E 

C 


where 

(9)  q*  = L(s  +i)  - E(s),  p+  = (pq+  + l)/q 

(10)  q“  = L(s)  - F ( s ),  p~  = (pq"  - l)/q 

The  above  expressions  for  Che  vertices  of  the  feasible 
quadrilateral  will  be  discussed  in  greater  detail  in  later 
sections.  We  note  here  that  neither  of  the  vertices  A,  C, 
D nor  the  points  in  the  two  sides  of  the  quadrilateral 
determined  by  them  correspond  to  lines  that  have  the  chain 
code  (N,  q,  p,  s)  after  digitization.  It  is  also  very 
important  to  note  that  (since  we  are  working  with  lines  of 
non-negative  slope  < 1 and  non-negative  ordinate  to  the 

origin  < 1)  the  quantities  p^,  q+,  q are  strictly 
positive,  while  p >=0  (in  fact,  from  (10)  it  follows  that 
p~  =0  only  if  p s q'  * 1),  This  remark,  which  is  omitted 
in  [Do-Sm] , is  crucial  to  provide  a correct  count  of  all 
distinct  digital  lines  of  length  N (cf.  Proposition  10). 
It  will  be  proved  in  the  next  section. 


Section  3,  Digital  Line  Formula  Proofs 

We  provide  here  the  proofs  of  the  formulas  (4)  - (10) 

of  previous  section,  since  as  we  pointed  out  the  manuscript 
[Do-  1]  is  unavailable  and  some  errors  might  have  occured  in  the 
original  derivations  of  these  formulas  E»o-Sm]  , 

The  reader  may  safely  skip  this  section  without  lack  of 
continuity. 

We  begin  with  an  observation  from  [R-Wl  which  remains 
valid  for  lines  of  finite  length  N due  to  the  fact  that  all 
digital  lines  arise  out  of  the  digitization  of  lines  of  the 
form  y = p/q  x + tn/q,  0 <=  m < q , p A q = 1 (We  assume  q>l 
since  for  q = 1 we  only  consider  the  line  y = 0). 
Lemma  1:  For  a line  of  slope  p/q,  vertical  displacement 

upwards  by  1/q  units  results  in  a cyclic  shift  of  the  code 
by  | digits  to  the  right  within  each  segment  of  length  q, 
where  l is  the  solution  of  the  equation 
(11)  lp  = -l(mod  q),  0 < < q. 

Proof:  For  the  purpose  of  this  lemma  we  can  consider  a 


digital 

line  of 

infinite  length  generated  by  the 

line 

of 

equation 

y ~ (p/q)x  + e,  0 <=  e < 1 

. When  e * 0 

the 

line 

contains 

exactly 

those  points  in  the 

X 

lattice  2 of 

the 

form 

(kq,  kp) 

, k(  zr  . 

When  e is  increased  the  code  remains 

the 

same  until  new 

lattice  points  lie 

on  the  line , 

if  a 

new 

lattice 

point  (q 

",  p 1 ) appears  for 

a value  e0  , 

then 

one 

gets  a 

transposition  of  the  0 which 

corresponds 

to  x - 

q1 

and  the 

1 that 

corresponds  to  x s 

q’  +1  for 

values 

of 

e < e , 

e e rt» 

as  a quick  look  at 

the  picture  shows. 

The 

.fan 
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points  (q1  + kq,  p*  + kp),  k » H , belong  to  the  line  y = 

(p/q)x  + and  no  other  lattice  point  does,  otherwise  the 

slope  could  not  be  p/q  with  p*q  <=  1,  Notice  that  because 

of  the  upward  shift  we  have  0 < q'  < q and  0 < p*  < p for 

the  first  value  e0  where  the  above  transposition  takes 

place.  This  implies  that  the  code  of  the  line  y ~ (p/q)x  + 

e0  is  the  same  as  that  of  the  line  y * (p/q)x  with  a right 

cyclic  shift  in  each  period  of  size  q*.  The  same  fact  will 

hold  between  any  two  successive  upward  shifts  of  the  same 

magnitude  e . It  remains  to  identify  this  magnitude  eQ  and 

the  value  q*.  Since  e is  the  first  positive  value  for 

o 

which  such  a shift  recurs,  we  have  that  the  parallelogram 
of  vertices  (o,o),  (q,p),  (q',p')  and  (q+q1,  p+p')  is  a 
minimal  parallelogram  on  the  lattice  (see  [H-W,  p.  28])  and 
hence  its  area  is  exactly  one,  i.e. 

(12)  p'q  - q'p  =>  qep  = 1 

From  this  it  follows  that  ea  = 1/q  and  q1  is  the  quantity 
defined  as  & by  (11).  After  successive  transitions  of  this 
size,  (or  what  is  the  same  after  q successive  cyclic  shifts 
of  size  % ) we  go  back  to  the  original  code.  H9 

We  are  now  ready  to  relate  the  code  (N,q,p,s)  to  the 
family  of  lines  that  induce  the  same  code.  First,  we  know 
that  it  is  induced  by  a line  of  the  form 

(13)  y ■ (p/q)x  + m/q  0 <=  m < q 

and  we  would  like  to  find  the  relation  between  s and  m. 
The  lemma  1 tells  us  that 

(14)  Urn  - s(mod  q). 
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Hence,  we  have 

(15)  s = 2-m  - kq  for  some  k >-  0, 

in  fact,  using  the  function  F introduced  in  (2)  we  can 
write 

s = F(Jm) 

since  all  the  function  F does  is  to  select  a representative 
in  0,1,...,  q-1  for  every  element  in  2Vq2'.  Substituting 
the  expression  (15)  into  (13)  we  obtain 

(16)  y - (p/q)s  + m/q  = (p  m)/q  + m/q  - kp  = p'm-kp  c 7/ 

where  the  third  identity  was  obtained  using  (12)  (Recall 
2 ° q'  in  (13)).  That  is,  we  see  that  the  value  s has  the 

property  that  for  x = s the  point  in  the  line  (13)  is  a 
lattice  point.  Furthermore,  this  is  the  first  lattice  point 
in  the  interval  0 <=  x < w which  lies  on  the  line,  other- 
wise the  slope  of  the  line  will  be  rational  with  denom- 
inator strictly  smaller  than  q and  in  contradiction  to  the 
fact,  we  are  assuming  that  the  chain  code  has  smallest 
period  q (this  justifies  the  letter  F to  denote  the 
function  on  “U  / qZ'  as  defined  by  (2)).  We  can  also  conclude 
that  the  value  y in  (16)  is  given  by 

Y = f(p/q)sl  and  m/q  =>  f(p/q)s]  - (p/q)s 

since  0 <=  m/q  < 1.  This  tells  us  that  the  line  (13) 

coincides  with  the  line  B given  by  the  dual  coordinates 
(6) , i.e. 

e - (F(s)(p/q)]  - F(s)(p/q),  <*  = p/q. 

As  a corollary  of  this  representation  and  lemma  1 we  obtain 
that  the  line 
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y » (p/q  )x  + (m+l)/q 

coincides  with  the  line  C described  by  (7),  namely  the 
infinite  line  will  have  first  lattice  point  when  k = 
F(a  + H ) and  since  0 < (m+l)/q  <=  1 , we  have  for  the 
corresponding  value  of  y that 
y = [F(s  + £)(p/q)]  + 1 

hence*  it  follows  that  the  dual  coordinates  of  G are  in 
fact 

e = 1 + |F(s  + H)(p/q)J  - F(s  + l)(p/q),  <*  “ p/q. 

We  could  by  abuse  of  language  denote  the  chain  code  of  C as 
(N,q,p,F(s+  It))  but  it  might  not  be  the  case  that  q is  the 
smallest  period  of  this  code*  as  we  have  in  the  example: 
B O (11,11,3,0)  4*  y - (3/ 1 1 ) x 
so  that  H = 7 and 

C «»  (11,11,3,7)  **  / = (3/ 1 1 ) x + 1/11 
which  has  code 

0001  001  0001 

whose  smallest  period  is  7- 

Let  us  call  "last  lattice  point  of  a line"  the  one 
with  largest  abscissa  still  <~  N.  We  are  going  to  consider 
now  two  other  lines  defined  by: 

(17)  A:  is  the  line  passing  through  the  first  lattice 

point  of  the  line  B and  the  last  lattice  point  of 
the  line  C. 

(18)  D:  is  the  line  through  the  first  lattice  point  of  C 

and  the  last  lattice  point  of  B 
Those  two  lines  are  well-defined  and  not  vertical  since  no 

. ’isfer **S8#W**  4 ' 


point  of  B coincides  with  a point  of  C and  these  lattice 
points  cannot  be  above  each;  other*  Neither  of  these  two 
lines  nor  the  line  C have  code  (N,q,p,s)  since  they  pass 
through  lattice  points  different  than  those  corresponding 
to  B»  Let  us  first  derive  an  important  property  of  this 
collection  of  four  lines  A,B,C,D.  We  note  that  if  we  have 
two  points  in  the  dual  space  L - (e  , a ),  L - (e  , at  ) 

% 1 2 4 2, 

which  correspond  to  lines  with  code  (N,q,p,s)  then  the 
point  L of  coordinates 

e = .le  + (l-,Oelt  * = + , 0 <=  A <=  1 

corresponds  to  a line  which  passes  through  the  same  pixels 
as  Lz  and  Lt  , in  fact  for  a given  x the  ordinate  y of  the 
corresponding  point  in  the  line  L is  just  y = A y + 
(1  - -1  )yzt  with  (xjYj)  f , (x,y4)  ( . So  the  set  of 

lines  with  code  (N,q,p,s)  forms  a convex  set  in  the  dual 
space.  Furthermore,  it  is  easy  to  see  that  this  convex 
set  Si  must  contain  an  open  neighborhood  of  the  open  segment 
defined  by  B and  C,  this  is  simply  the  fact  that  a line 

between  B and  C passes  through  the  same  pixels  as  B but 

passes  through  no  lattice  points  (by  lemma  1)  hence  its 
slope  can  be  jiggled  a bit  and  keep  the  same  code. 

We  are  going  to  look  at  the  (possibly  degenerate) 
quadrilateral  with  vertices  A,B,C,D.  For  that  purpose  we 
need  to  find  the  equations  of  the  sides,  e.g.  the  side  AB. 

We  are  looking  for  the  equation  of  a line  in  the  dual 

space,  that  is  an  equation  of  the  form 

= c , a + b <>  0 , 


' *iV- 


a(v  4*  be 


The  definition  of  A shows  that  A and  B have  a point  in 

commont  namely  the  first  lattice  point  of  the  line  B(  say 

(%0  ,yft ) ♦ and  hence  every  line  which  corresponds  to  a point 

in  AB  passes  through  the  same  point,  i.e.  it  satisfies 

the  equation 

x + e = yA 
o io 

which  has  the  desired  form.  Calling  (z^  ,wA ) the  last 

O 0 

lattice  point  of  B,  (x^  tVj^  ) the  first  lattice  point  of 

C,  (z^  ,wJ_ ) the  last  lattice  point  of  C we  have  the  following 

equations 

AB:  x0*r  + e = y0 

(19)  BD:  zflo  + e ==  wd 

DC:  XjfJ'  + e - y4 
CA:  z^i  + e = Wj 

We  note  that  on  one  side  of  the  line  AB  we  have  x jt  + e > y. 

0 o 

and  on  the  other  side  we  have  x a + e < y . On  this  second 

0 0 

side  we  have  that  no  line  passes  through  the  same  pixels  as 
B,  hence  it  cannot  have  code  (N9q1p1s)t  therefore  {e9'>' )* 

+ e y^}  . We  can  conclude  by  a similar  reasoning 

that : 

£ f x at  + e >=  y J A ? z n + e > = wj  0 {x  * + e 

< yti  ft  + e < wx3 

which  is  the  quadrilateral  determined  by  A,B,C,D* 

To  finish  the  proof  all  we  need  to  know  is  that  the 

half-open  segments  (A,B]  and  [B,D)  are  in  SI.  For  the  first 
one  it  follows  from  the  fact  that  there  are  no  lattice 
points  in  the  open  triangle  whose  sides  are  the  y-axis , the 
line  A and  line  B,  otherwise  we  consider  the  line  through 

such  a lattice  point  (x  ,y  ) and  (x  ,y  ),  it  will  have  the 
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same  code  as  B but  clearly  has  period  xfl-x2  which  is  strict- 
ly less  than  q (recall  x.  = s < q).  Similar  reasoning  holds 
for  the  other  segment.  Summarizing,  we  have: 

Lemma  2:  The  convex,  set  Si  of  all  lines  coincides  with  the 

(possibly  degenerate)  quadrilateral  of  vertices 
A,B,C,D. 

Lemma  3:  It  is  never  the  case  that  A = D,  i.e.  it  is 

impossible  that  we  have  simultaneously  that 

,Y0  * " (zo  ’V  and  »wi 

Proof:  In  this  case  Si  is  a triangle  (it  cannot  be  a segment 

since  A * BG  says  that  A is  parallel  to  B,  which  contradicts 
(17)).  One  of  the  sides  is  BC.  Hence  Si  cannot  contain  an 
open  neighborhood  of  the  open  segment  (B,C),  this  contra- 
dicts an  observation  made  above. 

It  remains  to  write  down  the  dual  coordinates  of  A and 

D.  For  that  we  need  to  consider  which  is  the  abscissa  of 

the  last  lattice  point  on  the  lines  B or  C.  For  the  lint'  B 

we  have  that  the  abscissa  of  the  first  lattice  point  is  x = 

s,  hence  the  last  lattice  point  is  x - s + kq , k >=  0,  x <- 

N.  This  implies  that  k = [(N-s)/qJ  , so  we  get 

x = s + |>(N-s)/q]q  = L(s)  (as  defined  by  (3)). 

since  the  function  L turns  out  to  be  a function  well-defined 

on  Zf/q'Z',  we  have  that  the  abscissa  of  the  last  lattice  point 

in  C is  L(s  +5.).  We  get  the  following  formulas  companion 

to  (19) , which  we  will  need  in  a later  section: 

xft  = F(s)  y0  = r(p/q)F(s)T  = (p/q)F(s)  + m/q 

= L(s)  wfl  = F(p/q)L{s)(  = (p/q)L(s)  * m/q 

x(  = F(s  + Jl)  yf  = 1 + l(p/q)F(s  + ll)J  = (p/q)F(s  + S.) 

+ (m+l)/q 


(20) 


* L<s  + A) 


w = 1 + |,(p/q)L(s  + A)J  = (p/q)L(s+£) 

1 + (m+I)/q 


The  line  A passes  through  the  points  (x  ,v  ) and  (z  ,w  ) 

oo  11 

hence 


" ” (“j  - )/<zi  ' *.> 


Define 


q - «t  - *0  • P ■ w1  - yo 


Then 


q+  = L(s  + A)  - F(s) 

and  p+  =wt  - y0  = (p/qMz—  z„  ) + 1/q  = (p/q)q4  + 1/q 

verifying  (9)  and  also  showing  p4A  q+  = 1.  We  already  know 
that  q+  <>  0 we  want  to  show  that  Lemma  3 implies  q4  > 0. 
In  fact,  we  have 

q4  = A + L(H~{s+£))/q]q 

and  the  only  problem  could  occur  if  s + A > N.  Then  we 
would  have  N - s < q and  s + l > q which  implies  that  L(s)  = 
F(s)  and  F(s  + A ) - L(s  + A ) . This  is  precisely  the  situa- 
tion forbidden  by  Lemma  3,  Now  we  want  to  find  the 
ordinate  to  the  origin  of  A,  we  have 
A:  y - yfl  = (p4 /q4 ) (x-xq ) 

hence,  using  (20)  we  obtain 

e = yo  - (p4/q4)xo  = TpCsHp/q)!  - F(s)(p4  /q4  ). 

This  finishes  the  verification  of  (5).  Going  through  the 
same  reasoning  for  the  line  D we  see  that  its  slope  is  given 
by 


,v  “ (p-  / q~  > , = 2 - x , p--  = w - y 

o a.  6 a 

so  that  q“  = L(s)  - F(s  + A ) as  required  and 

P"  = (p/q ) z + m/q  - ((p/q)x^  + (m+D/q) 

<S  1 


'.5  i 


= (p/q) (z  - x ) 1/q  “ (p/q)q  - 1/q 

0 1 

verifying  the  relation  (10).  Writing 
D:  Y ~ Y1  = (p“/q")(x-x  ) 

and  using  (20)  again  we  get 

e = yt  - (p  /q"  )x1  = 1 + |_(p/q)F(s+£)| 

- (p“/q“  )F  (s+il) 

which  is  the  only  thing  left  to  check  in  (8)  except  for 
seeing  that  q~  > 0.  But  this  is  again  Lemma  3.  Since 
q~  <=  0 only  could  occur  if  simultaneously  N-s  < q and  s+&  < 
q.  A computation  shows  this  leads  to  F(s)  = L(s)  and 
F(s  + fi.)  = L(s+£)  which  cannot  happen. 

Summarizing,  we  have  proved! 

Proposition  4:  The  formulas  (2)  - (10)  defining  the  quadri- 

lateral A,B,C,D  are  correct  and  furthermore  > 0,  q >0, 
p4  > 0 and  p~  >“  0. 

Section  4.  Feasible  Region  Shape 
The  description  of  the  set  of  all  lines  whose  digitiza- 
tion is  a specified  chain  code  of  a straight  line  segment 
will  now  be  used  to  obtain  a worst-case  bound  on  the 
subpixel  accuracy  with  which  we  can  locate  a point  in  the 
image.  We  will  show  that  given  a period  q chain  code  of  the 
digitization  of  straight  line  segment,  there  exists  a real 
number  x such  that  the  total  spread  on  v-values  at  the  poinr 
x of  all  line  segments  with  the  given  chain  code  is  1/q 
(see  Fig*2.2).  Thus  by  selecting  the  midpoint  of  this  set 
of  (x,y)'s  we  have  estimated  the  position  of  a point  on  the 


line  to  within  an  error  of  l/(2q).  This  provides  our  error 
bound*  In  Section  5 , we  will  examine  the  distribution  of 
l/(2q)  corresponding  to  a probability  distribution  on  lines. 

To  see  the  correctness  of  the  1/q  spread,  we  first 
observe  that  the  parallel  lines  B and  C of  the  feasible 
region  (Section  2)  have  slope  p/q.  We  show  that  their 
vertical  separation  is  1/q.  These  lines  may  be  thought  of 
as  providing  a channel  where  we  can  find  x values  where  the 
spread  is  1/q.  Next,  the  relationship  between  the  location 
of  the  feasible  region  vertices  in  (e,  a )-space  and  the 
location  of  points  on  possible  real  line  segments  with  the 
appropriate  digitization  is  established.  This  will  yield  a 
polyhedral  region  in  (x,y)-space  which  is  the  union  of  all 
feasible  lines.  Finally,  we  show  that  there  exists  a real 
number  x such  that  the  extent  of  the  feasible  region  over  x 
is  determined  only  by  the  lines  B and  C,  hence  is  of  width 

1/q. 

The  proof  that  B and  C are  1/q  units  apart  vertically 
is  now  given.  In  the  case  of  the  infinite  digital  line,  the 
calculation  that  the  spread  is  1/q  everywhere  is  straight- 
forward. By  passing  to  the  finite  case,  we  introduce 
boundary  effects  which  cause  the  spread  to  be  greater  near 
the  ends  of  the  chain  code,  but  the  following  lemma  shows 
that  at  least  one  point  of  the  1/q  width  channel  is 
preserved . 

Lemma  5:  Using  the  notation  of  section  2,  let  B and  C be 
the  vertices  of  the  feasible  region  for  a chain  code  with 


parameters  (N,q,p,s)  corresponding 


to  a straight  line 


segment.  Then  the  difference  of  the  y intercepts  of  the 
lines  corresponding  to  C and  B is  1/q. 

Proof:  It  can  be  obtained  from  the  way  the  line  C was 

defined  in  Section  3 or  otherwise  from  a direct  computation, 
which  we  omit,  using  the  ordinates  to  the  origin  that  appear 
in  (6)  and  (7).  ' H 

We  have  established  that  lines  B and  C are  separated  by 
a vertical  distance  1/q.  We  have  already  pointed  out  in 
Section  3 that  given  an  x value  and  the  four  lines  A,B,C,D 
evaluated  at  jc,  the  part  of  the  feasible  region  lying  over  x 
is  the  convex  hull  of  these  four  values. 

The  next  step  in  finding  a point  x0  at  which  the 
feasible  region  has  height  1/q  is  to  determine  the  way  in 
which  the  lines  A and  D intersect  the  parallel  lines  B and 
C.  We  will  show  there  is  an  interval  [a,b]  £ [0,N]  such 

that  the  lines  A and  D lie  between  the  lines  B and  C over 
the  interval  [a,b] . To  do  this,  we  establish  the  following 
facts  (see  Fig.  2.3): 

Let  l(.,.)  denote  the  x-coordinate  of  the  intersection 

of  the  two  arguments, 

1)  The  y-intercept  of  A is  less  than  or  equal  to  the 
y-intercept  of  D 

2)  The  y-intercept  of  C is  less  than  or  equal  to  the 
y-intercept  of  D 

3)  l(D»C)  <=  I (A ,C ) 

4)  I(A,B)  <=  I(D,B) 

5)  I(D,C)  <-  N , I ( A , B ) <-  N 

From  the  diagram,  we  can  see  that  selecting  a " max  (I(A,B), 
I(D,C))  and  b = min( I (A,C) , I ( B ,D) ) , the  feasible  region  has 
height  1/q  on  the  non-empty  interval  [a,b]. 


Lemma  6.  The  y-intercept  of  A is  less  than  or  equal  to  the 
y-intercept  of  B. 

Proof:  Denoting  the  y-intercepts  by  Y^  and  Yg  we  have 

Y ~ Y.  = rF(s)p/q]  - F(s)p/q  - fF(s)p/q1+  F(s)pT/q  + 
6 * » F(s).(p+/q+  - p/q) 

Since  F(s)  s s >=  0,  we  are  done  if  we  show  p+  /q+ 
p/q  > 0.  By  the  definition  of  p4,q4, 

p+/q+  - p/q  = (pq*  + 1)/Cqq+)  - p/q 
= p/q  + l/(qq+)  - p/q 

= l/(qq4) 

By  Proposition  4,  we  have  q+ > 0,  hence  we  are  done.  H 
Lemma  7:  The  y-intercept  of  D is  greater  than  or  equal  to 

the  y-intercept  of  C. 

Proof:  Denoting  the  y~inter cepts  by  Y^  and  Y^  we  have, 

using  the  same  type  of  arguments  in  the  previous  lemma 

- Yc  = F(s  + fl)(p/q  “ p"/q")  “ F ( S +A)  / ( qq“  ) 
and  we  are  in  the  same  situation  as  in  the  previous  lemma. H 
Lemma  8;  l(D,C)  <=  I(A,C) 

Proofs  By  (17)  and  (18)  we  have  that  X(A,C)  is  the  abscissa 
of  the  last  lattice  point  of  C,  i.e.  I(A,C)  = L(s+£),  while 
l(D,C)  is  the  abscissa  of  the  first  lattice  point  of  C, 
F(s+  £ ) . From  this  it  follows  immediately  the  conclusion  of 
the  lemma.  This  can  also  be  done  by  using  the  dual 
coordinates  of  A,D,C  but  at  the  cost  of  considerable 


computation.  M 

The  same  proof  yields: 

Lemma  9:  I(A,B)  <=  I(D,B) 

A 

From  what  we  have  just  said,  it  follows  that 


0 <=  a = raax( I ( A, B) , I ( D , C ) ) <=  min  (I ( A, C ) , I ( B , D ) ) 

= b <=  N 

hence,  we  are  guaranteed  that  there  exists  an  X * [0,N] 

such  that  the  feasible  region  over  x has  height  1/q. 
Therefore,  if  we  pick  the  line  L which  is  the  average  of  B 

and  C,  we  have 

(21)  min  max  |l(x)  - L0(x)|  <°  l/(2q) 

0 <=  x <=  W L i (N,q,p,s) 

where  L(x),  L (x)  represent  the  ordinate  of  the  point  in  L, 

0 

resp.  Lq,  with  abscissa  x. 

The  meaning  of  (21)  is  that  given  a digital  tine  with 
period  q in  the  sensed  image  and  such  that  the  underlying 
real  edge  has  slope  between  zero  and  one,  then  we  can  deter- 
mine the  vertical  aspect  between  sensed  and  reference  images 
to  an  accuracy  of  l/2q  pixels. 

Section 5. Infinite Digital  Lines 

The  feasible  region  for  infinite  digital  lines  is 
easily  computed  using  the  results  of  Section  4.  This 
analysis  is  divided  into  two  parts.  For  any  infinite 
digital  line  of  period  q,  we  show  the  channel  consists  of 
two  parallel  lines,  which  are  a vertical  distance  1/q  apart. 
Thus,  since  the  channel  extends  over  the  whole  x-axis  , 
there  is  no  flaring  at  the  end  as  in  the  finite  case.  If 
the  infinite  digital  line  is  aperiodic,  then  we  show  the 
channel  extends  over  the  whole  x-axis,  but  consists  of  a 
single  line.  Thus  the  maximum  error  is  l/2q  of  the  digital 
line  if  the  digital  line  has  period  q and  zero  if  the 


[•] 


digital  line  is  aperiodic.  The  aperiodic  infinite  digital 
lines  are  precisely  those  infinite  digital  lines  which  are 
the  digitizations  of  lines  with  irrational  slope.  Since  the 
irrationals  are  a set  of  measure  one  in  the  unit  interval, 
using  the  uniform  probability  measure,  we  see  that  the  error 
is  zero  with  probability  one  for  infinite  digital  Lines. 

Before  considering  the  periodic  and  aperiodic  lines 
separately,  we  note  that  any  two  infinite  lines  with  the 
same  digitization  are  parallel.  Let  y = mx  + b and  y = nx  + 
c be  two  lines.  Then  the  difference,  h(x),  in  the  y values 
of  these  lines  at  x is  given  by  hCx)  = (m-n)x  + (b-c).  If  m 
and  n are  not  equal  then  there  exists  a K>0  such  that  |h(x)) 
>1  for  all  x such  that  I xl  >K.  Thus  the  two  lines  cannot 
have  the  same  digitization. 

We  now  consider  the  case  of  infinite  digital  lines  of 
period  q.  By  the  feasible  region  description  in  Section  2, 
the  lines  corresponding  to  the  vertices,  A,B,C»  and  D of  the 
feasible  region  in  (e*n)  space  have  slopes  p~  /q~,  p/q( 
/q+.  Fixing  p,  q,  and  s and  letting  N go  to  infinite,  we 
see  the  above  result  on  the  slopes  of  infinite  lines  having 
same  digitization  imply  p~/q~,  and  p+/q+must  approach  p/q. 
Inserting  these  limits  into  the  formulas  for  the  vertices  A 
and  D,  we  see  that,  in  the  limit  A=B  and  C“D.  We  have  shown 
in  Section  4 that  B and  C are  a vertical  distance  1/q  apart. 
This  establishes  the  result  for  the  infinite  periodic 
digital  line. 


The  infinite  aperiodic  line  requires  a different 
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approach.  We  first  cite  a version  of  a classical  result 
[H-W]  on  lines  with  irrational  slope.  Let  f(x)  = mx  + b be 
a line  with  m irrational.  Then  the  set  £mx  + b - |mx  + b|: 
x is  an  integer^  is  dense  in  the  unit  interval.  It  has 
already  been  shown  th'at  two  lines  with  the  same  digitization 
have  the  same  slopes  and  can  only  vary  in  their  y-inter- 
cepts . Let  f >0  be  given.  Then  the  digitization!  L,  of  the 

line  y = mx  + b (m  irrational)  is  aperiodic  so  there  exists 

integers  K.  and  K„  such  that  mK,  + b - ImK.  + bj  <*and  mK, 

1 *•  A \ 2. 

+ b [mKt  + bj  > 1 - ( . Thus  decreasing  b by  more  than 
( would  change  the  digitization  at  and  increasing  b by 
more  than  * would  change  the  digitization  at  . Thus  for 
any  e > 0,  we  cannot  change  b by  more  than  without  changing 
the  digitization.  Hence  b is  fixed.  Since  in  is  also  fixed, 
the  channel  is  the  single  line  y - mx  + b . 

Section  6.  Invariant  Line  Measure 
A probabilistic  analysis  of  geometric  accuracy  requires 
a probability  distribution  in  the  fundamental  objects,  the 
lines.  It  is  tempting  to  place  a uniform  distribution  on 
the  coefficients  of  the  lines  represented  in  some  parametric 
form.  Unfortunately,  there  is  no  canonical  parametrization 
and  the  measure  will  not  be  uniform  with  respect  to  other 
parametrizat ions . A customary  escape  from  this  quandary  is 
to  impose  some  parametrization  independent  conditions  which 
single  out  a probability  measure.  In  geometric  probability 
problems,  one  generally  assumes  the  measure  is  invariant 


' ' ^ 


under  translation  and  rotation  of  the  geometric  figures,  in 
our  case  the  lines.  This  uniquely  determines  a coordinate 
system,  the  (p,<0  polar  coordinates  of  a line,  in  which  the 
distribution  is  uniform  with  respect  to  the  parameters  as 
shown  in  [S,  p.  28].  *■  To  write  this  measure  in  terms  of  the 
dual  coordinates  we  appeal  to  the  following  figure: 


^ v=  x + e 


We  clearly  have 

p = e.cosC'P-  ’T/2)  and  n j 2 - 0 = n - tp 
hence  p - e.cosfl  and 

dpA  = (cosWde  - esi'n  ff  dfl)  A dfl  a c o s 0 d e d 

*2.  * "J 

Using  tan  0 “ *t  we  obtain  dfl  = cos  0 d <*  = ( 1 + n ) d«  , 


so  finally 


(22)  dpAdp  » (1+  or) 


d eAd  nr 


is  the  invariant  measure.  We  want  to  normalize  (22)  so  that 
total  measure  of  0 <=  e < 1,  0 <~  0L<  1 is  exactly  1'.  From 

(23)  |(l+^)3/i  d - «(1+*?)  '/X 

(24)  /flSCl+fleV^  d*-  + 

we  obtain  that  the  normalized  invariant  measure  is: 


«£&■ 


(25) 


ded  * 


dj*  =/?<l+0  ' 

It  is  now  easy  to  compute  j*f(e,tf  )dyx.  , where  is  the 

ft 

quadrilateral  A,B,C,D  formed  by  the  lines  of  code  (N,q,p,s). 
It  is  just  necessary  to  recall  the  equations  (19)  and  (20) 
of  the  sides  of  this  quadrilateral: 


(26) 


ft 


J/w/ 

rwi"  \ 

/ f ( e ,<£  ) del 

(d*/(l+oil 

{p/q  \ 

V - x J 

(p!  q 

yx-  «xL 

f (e  ,o6)de 

(det/Cl  + fii1)^ 

P”/q“ 

w.  - zA 
® 0 

In  particular,  using  the  definitions  of  p+,q+  , p~  , q which 

appear  after  (20): 

,pW 


a (ft)  -7?/; 

' p/q 


((wl-y0)  + (x^XdK/U  + tfV'  ) 


.p/q 


+ f ((y  -w  ) + fl4(z  -x  )(doi/l  + q^)  ^ )^ 
p'/q-  1 4 11  1 I 

-jrf  5",/,+  (p4-q4-6XW(l*«tV/') 

P/q 


p/q  - ^ 3/*  , 

+ (-p+eiq  ) ( dcC/  ( l+o!  ) ) ^ 

p“/q" 


\fz  (l/(  (p+  ) +(q+)2)'/Z  - (pp+  + qq+)/(pl*<|*> 


+ l/((p~)*  +(q')^)^  - (pp~+  qq^)/(p2+qZ)  ^ J 


Regretfully,  this  expression  is  a bit  complicated.  One  can 
compare  it  with  the  Lebesgue  measure  of  $ without  much 
difficulty  and  finds 

lft[  * h-CM  4 llftl 

In  fact,  for  the  analysis  of  next  section  we  would  like  to 
compare  the  ^-measures  or  the  Lebesgue  measure  with  the 
measure  on  digital  lines  which  assumes  all  of  them  are 
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equally  likely.  Computations  show  that  the  invariant  line 
measure  tends  to  favor  lines  with  small  q,  especially  the 
horizontal  line  has  a rather  big  weight;  while  the  uniform 
measure  on  digital  lines  tends  to  favor  lines  with  higher  q. 
The  following  table,  gives  our  computations  for  lines  of 
length  N = 10. 


TABLE  6.1 

DIGITAL  LINE  INFORMATION 


q 

% TOTAL  AREA 

% PROBABILITY 

% DIGITAL  LINES 

1 

18.182 

17.469 

0.74 

2 

5.051 

5.137 

1.48 

3 

7.684 

7.809 

4.44 

4 

6.782 

6.868 

5.93 

5 

14.250 

14.438 

14.81 

6 

9.524 

9.514 

8.89 

7 

19.444 

19.626 

26.67 

8 

8.514 

8.552 

14.81 

9 

7.684 

7.720 

16.30 

10 

2.886 

2.868 

5.93 

In  the  next  section  we  discuss  the  error  analysis  of  the 
procedure  outlined  by  (21),  under  the  measure  that  gives  all 
digital  lines  equal  weight,  we  leave  for  later  the  compar- 
ison with  other  error  estimates  based  on  the  invariant 
measure. 

Section  7. Digital  Line-Probabilistic  Analysis 

A worst  case  bound  on  registration  accuracy  using  a 
digital  edge  was  developed  in  Section  4.  More  realistic 
error  information  can  be  obtained  using  probability.  In 
this  section  we  consider  the  question  of  obtaining  prob- 
abilistic information  on  the  registration  error  assuming  the 
real  world  edge  giving  rise  to  the  digital  edge  is  generated 


by  a natural  distribution  on  edges.  We  have  procedures  for 
estimating  these  probabilities,  but  due  to  the  considerable 
computational  cost  involved  in  evaluating  these  in  special 
cases,  we  prefer  to  first  seek  analytical  simplifications. 

Many  probabilistic  questions  pertinent  to  the  geometric 
accuracy  question  can  be  formulated.  Several  of  the  most 
basic  are: 

1)  Given  a maximum  allowed  registration  error,  what  is 
the  probability  that  the  actual  error  will  not 
exceed  this? 

2)  What  is  the  expected  value  and  the  variance  of  the 
registration  error? 

3)  Given  a maximum  allowed  registration  error  and  a 
maximum  allowed  probability  of  error  find  the 
largest  region  of  lines  (in  some  sense)  such  that 
lines  coming  from  this  region  will  result  in  an 
acceptable  size  error  an  acceptable  percentage  of 
the  time? 

We  now  turn  to  an  analysis  of  the  first  question.  We 
wish  to  determine,  for  any  acceptable  error  level  in  the 
estimated  offset  between  sensed  and  reference  image,  what  is 
the  probability  that  a random  edge  will  result  in  a digit- 
ization which  permits  estimation  to  less  than  that  error 
level.  Though  a simple  formula  for  these  probabilities  as  a 
function  of  digital  line  length  is  not  available,  a proc- 
edure for  calculating  these  probabilities  for  any  given  line 
length,  N,  is  described  and  results  for  the  case  N - 10  are 
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presented.  In  addition  we  present  asymtotic  upper  bounds  on 
the  error. 

The  basic  approach  to  computing  the  error  probabilities 
is  quite  simple.  A probability  density  function  is  given  on 
the  set,  A,  of  all  lines  with  slope  between  0 and  1,  going 
through  the  pixel  with  lower  left  vertex  (0,0).  Since  a 
line  has  only  one  chain  code,  the  sets  of  lines  with 
different  chain  codes  gives  a partition  of  the  set  A. 
Hence  the  density  on  lines  induces  a density  on  chain 
codes.  For  a chain  code  with  period  q,  the  maximum  error 
is  l/2q  as  was  shown  in  Section  4.  Thus  for  any  specified 
error  h,  we  must  calculate  the  probability  of  the  following 
set,  B,  of  line  chain  codes. 

B«={(N,q,p,s):  1 / 2 q < h j 

The  set  of  all  linear  chain  codes  of  length  N can  be 
enumerated.  For  each  chain  code  in  B,  the  corresponding 
feasible  quadrilateral  can  be  calculated  as  in  Section  2. 
The  density  function  on  lines  can  then  be  integrated  over 
the  quadrilateral  and  the  sum  of  these  integrals  over  all 
members  in  B computed.  This  sum  yields  the  desired 
probability . 

The  problem  of  enumerating  linear  chain  codes  of  lines 
through  the  origin  was  discussed  in  (R.-WJ  where  also  an 
algorithm  for  generating  the  set  of  linear  chain  codes  was 
presented.  We  have  not  found  any  estimates  in  the  liter- 
ature of  the  number  of  chain  codes  of  a given  length.  The 
problem  is  that  the  the  shortest  period  of  the  digital  line 
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of  length  N corresponding  to  a line 
y “ (p/q)x  + m/q 

might  be  strictly  smaller  than  q.  Since  such  lines  generate 
all  the  possible  digital  lines  and  we  can  associate  to  each 
a code  (N,q,p,s),  the  problem  reduces  to  characterise  those 
values  of  s for  which  this  code  does  not  coincide  with 
( N t'p  with  "q"  < q.  The  answer  lies  in  the  following. 

Proposition  10:  Given  a code  (N,q,p,s),  the  necessary  and 

sufficient  condition  that  it  does  not  coincide  with  a code 
of  strictly  smaller  period  is  that  q* > 0 and  q"  > 0,  where 
q q**  are  defined  by  (9)  and  (10). 

Proof:  The  necessity  of  this  condition  is  guaranteed  by 

Proposition  4.  In  order  to  go  further  we  have  to  analyze 

what  condition  on  s ensures  that  q***>  0 and  q"  > 0.  We  have 

q * - L(s+Jl)  “ F(s)  = S+S-+  LCN-b+JI  )/qJ  q - s 

B A + L(N-s+A  ) /qj  q 

q“  ° L(s)  - F(s+  ) = s + |_(N-s)/qjq  - (s+Jl)  + L(s  + .#)/qjq 

= |_(N~G)/qjq  + (s+£)/qjq  - £ 

Note  that  if  N-s  >-  q then  we  have  that  the  digital  line  has 
period  qt  since  the  digits  in  the  chain  code  corresponding 
to  x = s+l,,..,x  = s+q  <~  N,  form  the  chain  code  of  the 

standard  line,  i.e.  of  y - (p/q)x.  Of  course  in  this  case 

we  also  have  N - s - & >**  q - Jl  >0  hence  q4  >=Jl  > 0 and 
q~  >=  q - it  >0.  Suppose  now  N - s < q then  the  condition 
q"*  > 0 implies  that  s+H  >=  q and  hence  we  have  Ffs+P-)  ■ 
s+£-q»  Note  that 

N“F(s+A)  = N~(s+£)+q  >=  q 

since  q4*  > 0 implies  that  N-(s+$)>=  0.  So  we  only  have  to 

prove  that  a line  for  which  N-s<q,  and  N-F(s  + J!.)  >=  q has 


smallest  period  q.  Notice  that  this  says  that  the  line  y = 
( p / q ) x + m/q  passes  through  a single  lattice  point  at  x = s , 
while  the  line  y = (p/q)x  + (m+l)/q  contains  two  lattice 
points,  the  first  one  with  abscissa  F(s+£)  < s.  We  know 
hence  that  this  second  line  has  period  exactly  q since  if  we 
restrict  ourselves  to  F(s+£)  + 1 <=  x <=  F(s+2)  +,q  <=  N the 
q digits  in  the  chain  code  of  the  second  line  are  those  of 
the  standard  chain  code.  To  prove  that  the  original  line 
has  smallest  period  q it  is  enough  to  show  that  the  same 
portion  of  its  chain  code  has  smallest  period  q since  the 
period  of  a chain  code  cannot  be  smaller  than  that  of  any 
subchain.  That  is,  we  have  reduced  ourselves  to  show  that 
the  chain  code  of  the  digital  line  of  length  q correspond- 
ing to  y = (p/q)x  + (q~l)/q  has  smallest  period  q.  Calling 
Cj  the  standard  chain  code  and  c.^  the  chain  code  of  this 
other  line  it  would  be  enough  to  prove  that  the  sequence 
|cf  , is  exactly  the  sequence  making  an 

appeal  to  (1).  Now,  as  we  have  argued  in  Lemma  1,  the  code 
£c*  , is  obtained  from  { c^  , . . . , c^"j>  by  making  c*  =1, 

c^  =0,  Cj  -c  , K=j  < = q-l , while  c^“0,  c^  = 1 . To  finish  the 

proof  we  only  need  to  show  that  the  sequence  c ,..,,c  . is 

Z 

symmetric,  i.e. 

c.  = c . 

J 

But  Cj=[(p/q)jJ  - [(p/q)  ( j-l)J  and  c = L(  P / q ) C q- j -1  )j  - 

L(p/q)  (q-j  )j  - P+  [*(p/qK  j-l)]  “ (P+  |_-(p/q)jj  ). 

As  long  as  x is  not  an  integer  we  have  |-xj  = -jxj-1.  But 
2 <= j < = q- 1 indicates  that  neither  (p/q)(j-l)  nor  Cp/q)j  are 
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Li, 


i 


rT'" 

ii 

U 


i!  : 

if  . 


i.  i 


r,"* 

\\ 


r " 

I : 
I- 

V.  -r 

P" 

i. 


integers,  hence 

c^-yi  “ " L<p/q)  ( j-l)J  - 1 - (L(p/q)jj  - 1)  “ . 

*■  . * 

Due  to  the  above  characterization  of  Cj  we  have 

cj  - • 1 j <=  q 

which  shows  that  the  digital  line  of  length  N corresponding 
to  y » (p/q)x  + (q-l)/q  has  smallest  period  q,  and  hence  the 
same  is  true  for  the  original  line.  Hi 

Proposition  10  and  its  proof  gives  us  a way  to  compute 
the  number  L(N,q)  of  digital  lines  of  length  N and  smallest 
period  q.  In  fact  L(N,l)  = 1,  so  we  can  consider  q>l,  then 
the  situation  N~s<q  can  only  arise  if  N<  = q + s-l<:=2q-2 , that 
is,  (W+2)/2  <=  q.  Hence,  if  q < (N+2}/2,  s can  take 

arbitrary  values  and  it  follows  that 
(27)  L(N,q)  = q^(q)  for  2 <=  q < (N+2)/2 

where  w(q)  is  the  Euler  function  that  counts  the  number  of 
values  p,  1 <-p<=q , p q=l . This  formula  is  clearly  valid  for 
q = l since  </*(i)  = l.  In  the  remaining  range  of  q we  can  use 

that  when  p runs  over  all  the  values  considered  in  <s»(q),  so 

does  Ji.  , where  we  remind  the  reader  S.  is  defined ’by  (4).  We 

fix  ft  and  divide  the  range  of  s into  two  classes 

0 <s=  s <=  N-q,  N-q  + 1 <=  s <=  q-1 
The  second  class  is  not  empty  since  we  are  assuming  N+2<=2q. 
In  the  first  class  every  line  has  smallest  period  q,  this 
accounts  for  N**q  + 1 lines.  In  the  second  class  we  have  two 

subclasses,  s+  ^ < q and  q <=  s + Ji.  The  first  one  cannot 

introduce  any  lines  of  period  q due  to  the  condition  q~  > 0. 
In  the  second  one  we  have  to  consider  whether 


N-Cfi+H-q)  >=  q 


or  not.  Only  if  this  inequality  is  true  we  get  new  lines 


(due 

to  the  condition  q"** 

> 0).  Hence  we  must 

have 

max{q-5,  N-q  + l]  <=  s 

<=  min  £q-l,  N-A| 

which 

gives  us  1+min  {A-l,  N 

- q , 2q-N-  2 , q-fl-1^  lines  (notice 

that 

this  minimum  is  non- 

negative).  Therefore,  in 

this 

range 

of  values  of  q we  have 

(28) 

L(N,q>  « (N-q  + 2)  r^(q) 

+ 5-min  fzq-N-2  , q-i  - 1 , A-l, 

N-q] 

where  the  sum  takes  place  over  all  values  L,  1 <=  A <-  q-1, 
9,Aq  e l.  Since  this  expression  is  a little  bit  hard  to  work 
with,  we  can  use  upper  and  lower  estimates*  L (N,q)  = 


q V(q),  (N , q ) = (N-q  + 2)<p(q)  for  q in  this  range.  Finally, 
setting  h(N)  = total  number  of  digital  lines  of  length  N,  we 
get  the  estimates 

Tn/  2I  N 

L (N)  = I q<p(q)  + Z (N-q+2)e>(q) 

(29)  * q=l  (N/2)+l=q 

<=L(N)<=L  (N)=  E q»(q) 

q = l 

Using  the  above  formulas  we  can  produce  the  following  table 
for  N = 10 


' V*i  ' ' 


TABLE  7.1 


r 


rr.im 

U.  U 


IF 


1 

1 

u 

q 

(q) 

L (N , q ) 

L(N,q) 

L <W,q) 

1 

{ 1 

i 

1 

l 

1 

Li. 

2 

1 

2 

2 

2 

3 

2 

6 

6 

6 

IT 

u 

4 

2 

8 

8 

8 

5 

4 

20 

20 

20 

6 

2 

12 

12 

12 

r * 

7 

6 

30 

36 

42 

is 

8 

4 

16 

20 

32 

ti.v 

9 

6 

18 

22 

54 

rr'Jt' 

1 1 
I,': 

10 

4 

8 

8 

40 

U:: 

E 

TOTAL! 

21 

135 

217 

We  notice 

that 

L(N) 

is  fairly  close 

to  L^  (N) 

and  very 

r 

different 

from 

L*  (N). 

L (N)  would  have  been  the 

count  if 

u. 

no  digital 

lines  drop 

their  period  when  considered 

to  have 

f 

finite  length. 

Since 

we  want  to  develop  some  asymptotic 

bounds  for 

the 

error 

of  the  choice 

(21)  for 

subpixel 

accuracy  we  introduce  a different  upper  bound  function 


L (N,q)  defined  as  follows: 

L*(N,q)  = L(N , q ) 1 <=  q <=T(N/2)1 

(30)  L*(H,q)  = L^CHjq)  + (2q-N-2)  ( <w(q  )-2)  , (N/2)  + l <=  q 

<=  (2/3)N  + 2/3 


L*(N,q>  = L*(N,q)  + (N-q ) ( ?>(q  ) -2 ) , (2/3)N+2/3  < q 
<°  N 


The  choic 

terms 

ind' 

Since 

the 

only 

have 

L*  (N, 

4> 

integer) 

values 

to 

integer)  and  for  q - N.  For  N = 10,  we  have  only  three 


L*(N,7)  = 38,  L*(N,8)  = 20,  L*(N,9)  = 22 


* " * 

(We  have  used  L C N ) * Z L (N,q)). 

q = l 

PROPOSITION  II:  The  exact  number  of  digital  lines  is  given 

bv 

.n 

the  formula  L ( N)  = ZL(N,q),  L ( N , q ) defined  by  (27)  and 
(28). 

q = l 

It  satisfies  the  inequalities 

L#(N)  <=  L(N)  <=  L*(N), 

where  the  functions  L^(N),  L (N)  have  been  defined  above  and 
satisfy  the  asymptotic  estimates: 

(31)  L ^(N)  = (3/4^)NS  + 0(N*logN)  0.076N* 

(32)  L*(N)  = (10/91^)^  + 0 ( N^l ogN ) 0.I12N3 

PROOF:  We  only  have  to  prove  the  estimates  (31)  and  (32). 

We  use  the  methods  used  in  [H-W]  to  prove  the  following 
asymptotic  formula: 

N t 

(33)  $(N)  =>  Ztf(q)  = 3N  fW  + O(NlogN). 

1 

The  idea  is  to  write  using  the  Moebius  function}*. 

(34)  4(q)  = q 2|A(d)/d. 

d I q 

It  will  be  useful  to  find  first  the  asymptotics  of  L (N). 
For  any  N, 

**  N N , 

L (N)  = £.  qf(q)  = Z q ^-/x(d)/d 

q=l  q = l d|q 

We  now  write  q = dd 1 and  substitute  in  the  last  term: 

L**(H)  - 2 d^ld*  )Z  «.(d)/d 

dd ’ <=N 

N y i 

Z du(d)  SI  (d») 

d-1  ' d f <=N/d 

"r  -T ” - . — . - - — 


term  ^ *<=N/d  ^ ^ ° + 0(N  )•  Inserting 


L*  (N) , we  obtain 

%yc  3 N *2,  i 

L (N)  = ( 1 / 3 ) N I u(d)/d  + 0(N  logN). 

d = l 

Note  we  have  used 

■ N * * N 2 

£ d^(d)N  /d  | <=  N 1 1/d  = 0 ( N logN) 
d=l  d=l 

^ 2-  7.  ^ 2. 

But  we  have  £j».(d)/d  = 6 /if  [H-W]»  Hence,  Z fj>(  d)/d  = 

6/ti  + 0(1/N).  Substituting  this  into  L (N),  we  get 

kjc  ;J£  3 <2  2 

(35)  L (N)  = 2N  /IV  + 0 ( N logN). 


We  can  now  get  the  asymptotic  formula  for  L (N).  Recall 
that  we  have,  from  (29), 

L^(N)  = N3/(4KX)  + 0(N2logN) 

N N 

+ (N  + 2)  £ P(q)  - £ qj^(q) 

(N/2) +1  ( N / 2 ) + 1 

N 

We  can  write  £ ^ ( q ) = $(N)  - ^?(N/2) 

(N/2)+l 


= SN^/jV^  - SN^MV"  + 0 (N logN ) 
z.  Z 

= 9N  /(4<t  ) + O(NlogN) 


Similarly  Z q 6(d)  = L (N)  - L (N/2) 

(N/ 2)+i  5 ^ ^ * * 

= 2N  /lb  - (2/tf  )(N/2T  + 0 (N  logN) 

= 7N  /(4’fl'*’)  + 0(NilogN) 

So  that  we  finally  get  L^(N)  = SN^/UtT*)  + O(N^logN). 

Using  the  definition  (30)  we  obtain: 

(2/3)N+2/3  (2/3)N+2/3 

L (N)  = Z q<*(q)  - 2 Z (2q-N-2) 

1 (N/ 2 ) + 1 


I 
1 

N N 

* (2N  +2)  ST  ljl(q)  " 2 Z q£(q) 

(2/3)N+2/3  ( 2/ 3 )N  + 2/3 

N 

- 2 Z (N-q) 

(2/3)N+2/3 

* L**((2/3)N+2/3)  +2N(§(N)  - $( ( 2/ 3 )N+2/ 3 ) ) 

- 2(L**(N)  - L*((2/3)N  + 2/3))  +0(N2) 

We  introduce  now  (33)  and  (34)  into  this  expression; 

L*(N)  = (6/'tt,’Z)((2/3)N)3  - 4N3/'Ti''Z  + 6NS/'h'’1' 
-(6N/fl/Z)((2/3)N)2  + 0(N21orN) 

= (lO/OfT1)  )N*  + 0 (N^logN ) 

We  note  that  L . (N ) * 0 . 076N  and  L*(N ) a 0 . 1 1 2N  if  we 
disregard  the  0(N  logN)  term,  for  N - 10  these  approxima- 
tions are  not  very  good*  Nevertheless  for  the  coming 
estimates  it  is  only  the  leading  term  that  counts* 

Remark:  On  purely  heuristic  grounds  one  can  propose  an 

approximate  formula  L(N)  to  the  correct  value  l*(N).  It 
consists  in  assuming  that  the  values  that  appear  in  (28) 
are  uniformly  distributed  with  density  <J>(  q ) / q . Then 
^ (2/3)N+2/3 

L(N)  = L (N)  + Z ($(q)/q)(2q-N-2) (N~q) 

( N / 2 ) + 1 

, N 

+ Z ( 

(2N/3 )+l 

It  ia  clear  that  L ( N)  <=  T(N)  and  also  L(N)  & L*  (N) , 

* 

since  N— q<q  and  2q-N<q  in  both  sums.  It  is  not  apparent  how 
to  find  the  correct  relation  between  L(N)  and  L(N)  but  we 
note  that  for  N - 10  we  immediately  get  from  Table  7*1  the 
remarkable  value 


<j>(q)/q)(N-q)(2q~N) 


■«f!  ‘Trt 


LUO)  » 135.47 


Besides,  one  can  show,  by  the  same  methods  used  in  Proposi- 
tion 11,  that  the  following  asymptotic  development  holds 
L(N)  = N3/^  + OCN^logN) 

which  fits  right  between  the  values  in  Proposition  11.  It 
is  tempting  to  conjecture  that  L(N)  has  the  same  asymptotic 
behavior.  In  fact,  we  computed  L(N),  using  (27)  and  (28), 
and  L(N)  for  N = 100  and  found  the  following  values 
L(N)  = 104,359 

L(N)  = 104,949 

L(N)/N3  = 0.104359 

l/tf*  = 0.101321 

which  clearly  reinforces  the  conjecture. 

Let  S(N)  be  given  by 
N 

(36)  S(N)  = Z (l/q)L(N,q) 

q = l 

Then  the  offset  error  incurred  by  using  the  line  parallel  to 
B passing  through  the  middle  of  the  channel  is  given  by 

(37)  E(N)  = ((1/2)S(N) )/L(N) 

when  we  use  the  uniform  distribution  on  digital  lines. 
PROPOSITION  12:  Op  to  terms  of  the  form  0 ( ( logN ) /N *)  the 

offset  error  defined  in  (37)  satisfies  the  estimates 

(38)  (29/ 40) (1/N)  <=  E(N)  <=  (59/54)(l/N) 

PROOF:  We  start  with  the  lower  bound  for  E(N).  It  is  clear 
that  E(N)  >=  1/(2N).  To  improve  on  this  we  note  that  up  to 
q 53  N/2  the  sum  of  the  terms  in  S(N)  is  exactly  tjj  (N/2), 


hence 


N 

$(N/2)  + S (l/q)L(N,q) 

2E(N)  * (N/2)-H >= 

t (N/2)  + (L(N)  - L (N/2)) 

1 N$(N/2)  + (L(N)  - L (N/2)) 

. M L (N/2)  + (L(N)  - L (N/2)) 

Now  N$(N/2)  > L**(N/2)  because  in  S(N)  we  divide  by  q, 
and  here  are  considering  1 <=  q <=  N/2  only.  It  is  easy  to 
see  that  the  function  (a+x)/(b+x)  is  strictly  decreasing  if 
a > b,  hence  the  above  expression  diminishes  if  we  replace 
L(N)  by  L* (N)  and  we  obtain 


2E(N)  > = 1 N4>(N/2)  + (L  (N)  - L (N/2)) 

N L**(N/2)  + (L* (N)  - L** (N/2) ) 

* 4>(N/2)  + (L  (N)  - L (N/2)) /N 
— , - 

L (N) 

- (3/^2)(N2M)  + (((1Q/9)N3)/tt2  - (2/x2)(N3/8))/N  + 0(logN/N2) 

((10/9)N3)/TrZ 

- (29/20) (1/N)  + 0(logN/N2) 


Therefore  we  get 

E (N)  >=  (29/40)(l/N)  + 0(logN)/N*. 

Let  us  now  work  an  upper  bound  for  E(N).  We  use  a slightly 

more  complicated  method.  Replacing  L(N,q)  by  L (N5q)  in  the 

expression  of  S(N)  we  have 

. (2/3)N+2/3  N 

S*  (N ) = T d>(. q)  - 2 X <£(q) 

1 (2/3)N-i-2/3 

N 

+ 2N  Z ( + 0(N) 

(2/3)N+2/3  ' 
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The  only  new  difficulty  consists  in  estimating  the  term 

S-(4>(q)/q). 


Using  the  formula  (34)  we  get 
N 

£ ($Cq)/q)  =r  IA(d)/d 

(2/3)N+2/3  q d | q 

N 

- £ (u(d)/d)((l/3)(N/d)  + 0 ( 1/d  ) ) 

d = l 

since  by  writing  q = dd',  we  get  ((2/3)N  + 2/3)/d  <=  d*  <= 
N/d.  By  the  same  argument  we  used  to  obtain  (35)  we  see 

that  this  term  is  exactly 
(N/3)(6/tfZ)  + 0(1) 

The  first  two  terms  in  S5C(N)  can  be  computed  using  (33),  and 
we  finally  get 

S*(N)  = 2Nt'/lTt'+  0 (N ) 

On  the  other  hand, 

S(N)  + (L*(N)  - L(N))/N  <»  S*(N) 

Dividing  by  L ^ (N)  we  obtain  (up  to  CHlogN/N^  )) 


But 


(L(N)/L  (H))((S(N)/L(N))-1/K)  + 1/N  (18/10)(1/N) 

L(N)/L*(N)  >-  L (N)/L*(N)  * 27/40 

* 


hence  2E(N)  - 1/N  £ ( 8/ 10 ) (40/ 27 ) ( 1 /N ) 

which  leads  to  the  estimate  (38). 

At 

REMARK:  Corresponding  to  the  heuristic  estimate  L(N)  given 

above  for  the  correct  number  of  lines  L(N)  we  can  construct 
a heuristic  formula  for  the  asymptotic  error,  *E(N)  , 
'E(N)  « 1/2  'S(N)/L(N)  , 


N/2 


N 


S(N)  » Z £(q)  + £ $(q ) ( (4N+2)  / q - 3 - (N  /q  )) 


,;K 


N/  2 + 1 


D 

D 

0 

3 

a 


where 
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One  finds , using  the  same  type  of  reasoning  as  in  Proposi- 
tion 12, 

s'(N)  ■ (6(l-log2)H2‘) /tf*  + O(HlogN) 

'e(N)  - (3(l-log2)  )fWU  (1/N)  + OClogN/N*) 

54  0.92/N 

which  is  in  fact  in  tune  with  the  upper  and  lower  bounds 
obtained  in  Proposition  12,  namely  0.72/N  and  1.09/N 
respectively.  It  would  be  very  interesting  to  show  that 
£CN)  has  the  same  asymptotic  behavior  as  ^(N). 

We  remark  that  though  the  asymptotic  behavior  of  the 
expected  value  of  the  offset  with  respect  to  the  invariant 
measure  p.  is  very  hard  to  obtain  due  to  the  nature  of  the 
formulas  from  section  6,  for  any  concrete  value  of  W it  is 
perfectly  possible  to  compute  this  expected  value  using  the 
explicit  nature  of  the  formulas  for  the  measure  of  the 

quadrilateral  associated  to  any  digital  line.  We  have  done 
this  for  N **10  and  obtained: 


TABLE  7.2 


ERROR  PROBABILITY  (MAX  ERROR)  > ERROR 


0.5000 

0.2500 

0.1666 

0.1250 

0.1000 

0.0833 

0.0714 

0.0625 

0.0555 

0.5000 

0.0000 


0.0000 

0.0147 

0.0294 

0.0735 

0.1323 

0.2794 

0.3676 

0.6323 

0.7794 

0.9412 

1.0000 


Given  an  entry,  a,  in  the  first  column,  the 
corresponding  entry  in  the  second  column  is  the 
percentage  of  digital  lines  of  length  ten  whose 
maximum  registration  error  exceeds  a. 


Line  length  - 10 

Table  7.2  Error  probabilities  for  digital  lines 
without  points  missing 


jks- -’T'wr* r> 


The  procedure  for  estimating  the  best  real  edge  giving 
rise  to  a digitization  -is  improved  considerably  if  we  make 
use  of  the  fact  that  the  slope  of  the  underlying  real  line 
is  known*  In  our  algorithm  from  the  first  year,  discussed 
in  Sections  2-7,  we  used  the  point  on  the  best  edge  esti- 
mate to  match  with  a corresponding  point  on  the  reference 
edge.  This  unfortunately  assumed  the  horizontal  offset  was 
known  and  the  vertical  estimate  was  to  be  estimated.  By 
using  intersecting  lines  we  wanted  to  minimize  the  problem. 
In  a new  version  of  the  algorithm  the  problem  of  needing  to 
know  the  - offset  hap  eliminated  and  the  errors  are 
considerably  reduced. 

The  original  algorithm  uses  the  line  midway  between  B 
and  L where  (A,B,C,D)  are  the  vertices  of  the  feasible 
quadrilateral  corresponding  to  the  digital  line.  This  line 
will  not  in  general  have  the  correct  slope  so  there  is  no 
way  to  map  it  into  the  reference  edge  by  a translation. 
That  is  why  we  selected  a point  on  the  midline  to  use  in 
interimage  matching.  Our  new  algorithm  is  identical  to  the 
old  one  except  that  we  sele.t  the  midline  among  all  lines 
with  the  proper  slopes.  In  terms  of  the  feasible  quadri- 
lateral for  the  digital  edge,  since  we  know  the  slope,  we 
know  we  are  on  a fixed  horizontal  line  in  y-intercept  slope 
space.  Thus  we  can  restrict  our  attention  to  the  inter- 
section of  this  horizontal  line  with  the  feasible  quadri- 
lateral and  select  the  midpoint  of  this  line  segment  as  our 


beat  edge  in  image  space.  We  can  now  map  our  best  edge 
estimate  to  the  edge  in  the  reference  image,  since  the  two 
lines  have  the  same  slope.  Since  we  are  mapping  the  entire 
estimated  line  and  not  a single  point  on  it,  we  are  not 

using  subpixel  information  about  the  offset  in  the  x-direc- 

tion.  The  worst-case  and  expected  error  bounds  previously 
derived  provide  bounds,  which  in  general  are  unduly  pessim- 
istic. Geometrically,  one  can  see  this  as  follows.  The 

worst  case  error  in  the  previous  algorithm  is  half  the  ver- 
tical distance  between  lines  B and  C.  For  a real  edge  with 
the  same  slope  as  B and  C,  this  maximum  error  can  be 

realized  in  the  new  algorithm.  For  any  other  slope,  the 
maximum  error  is  half  the  vertical  distance  between  the 
uppermost  and  lowermost  line  of  that  slope  lying  in  the 
channel  formed  by  A,B,C,  and  D,  But  this  maximum  separa- 
tion decreases  monotonical ly  as  the  slope  moves  away  from 

that  of  5. 

The  above  result  on  the  decrease  in  error  can  be 

easily  given  analytically.  Assume  the  line  B has  slope 

= p/q  and  the  real  edge  has  slope  m^  > The  proof 

for  m,  < m,  is  simpler.  Let  the  intersection  of  A and  B be 

(x  , y ) and  let  the  intersection  of  G and  D by  (x  , y ). 

Then  (x^  , y^  ) is  the  left  end  point  of  the  bottom  of  the 

channel  and  (x  , y ) is  the  right  end  point  of  the  top. 

Since  > m,  , the  lowermost  line  with  slope  m.  and  going 

2.  3L  Z 

through  the  channel  passes  through  (x  , y }*  Similarly, 

1 1 

the  uppermost  such  line  passes  through  (x^,  ) * 


Let  E 


have  y-intereept  b.  Then  B has  the  equation 


y - ra^x  + b 


Since  (x  , y ) is  on  B,  we  see  that  h = y - m x.  . The 
L 1 1 a 

vertical 


separation  between  B and  C is  1/q  so  C has  the  equation 
y=m1x+b+l/q 


“ m x + y - m x + 1/q, 
11  11 


The  lowermost  line  with  slope  has  the  equation 


y = m^x  + 7X  - 


since  it  has  slope  and  is  constrained  to  pass  through 


(x^  , y^  ) . In  a similar  fashion,  the  uppermost  line  with 


slope  m^  has  the  equation 


y = mz*  + - raix2* 


The  difference  in  y intercepts  between  the  two  lines  with 
slope  w is': 


hl  “ Yz  “ miXl  “ (yj.  - "iV 


= ml  x2.  + 


\ - ’Vo.  + i/I  - »***.  - 


+ m x 

A J 


= 1/q  + m^x^  “ xz)  * “ x£^ 


= 1/q  + (m  - mz)(x2  - x^ 


Since  x„  > x„  and  ra  < m , (m  - ra  )(x,  - x ) < 0. 
cl  1 Z lie  1 


Hence 


h^  < h and  we  have  shown  that  the  maximum  error  goes  down 


as  Cm  - m )(x  - x ),  The  quantities  x and  x can  be 

* i i i 2.  i 

calculated  In  terms  of  the  vertices  of  the  feasible  quadri- 


Latera 


544 


Section  9.  Expected  Error  for  Vertical  Offset  Estimation 
Using  Slope 

This  section  provides  further  probablistic  analysis  of 
our  procedure  for  estimating  the  position  of  real  lines. 
This  procedure  made  use  of  the  known  slope  of  the  real  line 
to  restrict  the  set  of  feasible  lines  to  lie  on  a hori- 
zontal line  segment  in  the  feasible  quadrilateral.  We  now 
describe  various  error  expectations  associated  with  these 
procedures.  First  we  examine  the  expected  error  given  the 
digitization  and  real  line  slope.  Next  we  integrate  this 
error  over  the  quadrilateral  corresponding  to  the  digital 
line  to  obtain  the  expected  error  over  all  real  lines  with 
the  specified  digitization.  Finally  this  is  summed  over 
the  entire  image.  If  we  view  the  horizontal  offset  as 
known,  this  gives  the  expected  error  in  the  vertical  off- 
set. If  we  do  not  know  the  horizontal  offset,  the  compu- 
tation gives  the  error  in  the  linear  relation  between  x and 
y.  A second  line  is  then  required  to  solve  for  values  of 
the  x and  y offset. 

The  expected  error  given  the  correct  digital  line  and 
slope  of  the  real  line  is  a trivial  computation.  We  have 
shown  that  the  set  of  all  feasible  lines  is  a horizontal 
line  segment  in  y-intercept,  slope  space  and  that  our  esti- 
mate of  the  real  line  position  is  the  real  line  in  the 
image  space  corresponding  to  the  midpoint  of  the  line 
segment.  Since  the  invariant  measure  on  lines,  when 
restricted  to  a horizontal  line  in  y-intercept,  slope 
space,  is  the  uniform  measure  on  the  line  segment,  the 


expected  error  is  merely  the  expected  distance  of  a point 
on  the  line  segment  from  the  center  of  the  line  segment. 
The  computation  of  the  expectation  is  a simple  computation 
using  elmentary  calculus  and  the  result  is  r/4  where  r is 
the  length  of  the  line  segment. 

The  next  problem  is  to  determine,  for  a given  quadri- 
lateral corresponding  to  a digital  line,  what  is  the 
expected  error  in  vertical  offset  estimation.  This  is  the 
expected  error  in  offset  estimation  given  that  we  use  the 
known  slope  of  the  real  line  to  restrict  ourselves  to  the 
appropriate  horizontal  line  segment.  More  precisely,  let 
yfl  , y^ denote  the  y-coordinate  of  the  lowermost  and  upper- 
most vertices  of  a feasible  quadrilateral  and  let  h(y) 
denote  the  width  of  the  feasible  quadrilateral  at  height  y. 
Then  the  expected  error  given  that  we  are  at  height  y is 
h(y)/4  and  the  expected  error  over  the  entire  quadrilater- 
al, Q , is 

E(Q)  = (l/k)  *k(h(y)  / ( l*yZy‘^)dy 

V 

where 

k=  Hl+yZ)^Zdy. 

Values  of  E(Q)  for  the  various  quadrilaterals  are  given  ir 
Table  9.1,  columns  1 and  2. 

Next  the  expected  error,  E,  over  the  whole  image  if 


~ ~ - 
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computed  by  multiplying  the  error  associated  with  each 
quadrilateral,  Q,  by  the  probability  of  Q occurring  (using 
the  invariant  measure).  The  value  was  computed  to  be  .056 
pixels.  Thus  the  average  accuracy  in  vertical  intercept 
estimation  given  the  correct  digital  line  is  approximately 
1/20  pixel. 


Table  9.1.  Expected  errors  vs  Slope  for  Digital  lines  with 
zero  and  two  missing  pixels. 


Number  pixels  missing 
0 2 


Slope 


0.00000 

0.01000 

0.02000 

0.03000 

0.04000 

0.05000 

0.06000 

0.07000 

0.08000 

0.09000 

0.10000 

0.11000 

0.12000 

0.13000 

0.14000 

0.15000 

0.16000 

0.17000 

0.18000 

0.19000 

0.20000 

0.21000 

0.22000 

0.23000 

0.24000 

0.25000 

0.26000 

0.27000 

0.28000 

0.29000 

0.30000 

0.31000 

0.32000 

0.33000 


0.00000 

0.00161 

0.00247 

0.00282 

0.00265 

0.00244 

0.00188 

0.00181 

0.00147 

0.00222 

0.00222 

0.00281 

0.00258 

0.00384 

0.00321 

0.00415 

0.00461 

0.00393 

0.00448 

0.00847 

0.00496 

0.00512 

0.00467 

0.00758 

0.01280 

0.00804 

0.00500 

0.00651 

0.00613 

0.00642 

0.00593 

0.01135 

0.01877 

0.01684 


0.00000 

0.00188 

0.00326 

0.00425 

0.00494 

0.00560 

0.00624 

0.00716 

0.00840 

0.01047 

0.01190 

0.01339 

0.01444 

0.01599 

0.01618 

0.01660 

0.01730 

0.01756 

0.01977 

0.02497 

0.02282 

0.02262 

0.02371 

0.02395 

0.02392 

0.02504 

0.02529 

0.02556 

0.02723 

0.03034 

0.02889 

0.02734 

0.02472 

0.02651 


0.34000 

0.35000 

0.36000 

0.37000 

0.38000 

0.39000 

0.40000 

0.41000 

0.42000 

0.43000 

0.44000 

0.45000 

0.46000 

0.47000 

0.48000 

0.49000 

0.50000 

0.51000 

0.52000 

0.53000 

0.54000 

0.55000 

0.56000 

0.57000 

0.58000 

0.59000 

0.60000 

0.61000 

0.62000 

0.63000 

0.64000 

0.65000 

0.66000 

0.67000 

0.68000 

0.69000 

0.70000 

0.71000 

0.72000 

0.73000 

0.74000 

0.75000 

0.76000 

0.77000 

0.78000 

0.79000 

0.80000 

0.81000 

0.82000 

0.83000 

0.84000 

0.85000 

0.86000 

0.87000 


0.00981 

0.00718 

0.00633 

0.00815 

0.00801 

0.01545 

0.00797 

0.00812 

0.01043 

0.00840 

0.00786 

0.00826 

0.01114 

0.01909 

0.02859 

0.04435 

0.02909 

0.01976 

0.01173 

0.00885 

0.00857 

0.00932 

0.01179 

0.00934 

0.00935 

0.01885 

0.01018 

0.00983 

0.00807 

0.00901 

0.01270 

0.02283 

0.02605 

0.01613 

0.00865 

0.00962 

0.00943 

0.01032 

0.00816 

0.01354 

0.02229 

0.01367 

0.00917 

0.01023 

0.01010 

0.01870 

0.01025 

0.00874 

0.01134 

0.01104 

0.00901 

0.01158 

0.00825 

0.00971 


0.02937 

0.03146 

0.03304 

0.03376 

0.03684 

0.04415 

0.03730 

0.03576 

0.03772 

0.03977 

0.03755 

0.03590 

0.03732 

0.03996 

0.04245 

0.04435 

0.04325 

0.04143 

0.03940 

0.03850 

0.04091 

0.04410 

0.04263 

0.04111 

0.04374 

0.05277 

0.04502 

0.04186 

0.04183 

0.04075 

0.03893 

0.03590 

0.03430 

0.03892 

0.04199 

0.04532 

0.04195 

0.04049 

0.04133 

0.04216 

0.04167 

0.04324 

0.04453 

0.04416 

0.04612 

0.05317 

0.04422 

0.04074 

0.04258 

0.04315 

0.04484 

0.04749 

0.04599 

0.04588 
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. 0.88000 

0.00838 

0.04490 

0.89000 

0.00927 

0.04370 

0.90000 

0.00684 

0.03918 

0.91000 

0.00957 

0.03792 

0.92000 

0.01155 

0.03836 

0.93000 

0.01789 

0.04105 

0.94000 

0.02418 

0.04498 

0.95000 

0.03403 

0.05119 

0.96000 

0.04451 

0.05874 

0.97000 

0.05776 

0.06730 

0.98000 

0.07230 

0.07740 

0.99000 

0.08883 

0.08883 

Total  <error> 

0.04291 

0.11799 

Section  10  Expected  Errors  with  Pixels  Incorrect 
Much  of  the  theoretical  analysis  in  the  present  study 
deals  with  the  problem  of  edge  location  when  a digital  edge 
is  known,  but  the  position  of  the  underlying  real  edge  is 
unknown.  In  this  section  we  consider  the  analysis  of  edge 
location  estimation  in  the  presence  of  incorrect  pixels  in 
the  digital  edge.  We  have  been  unable  to  derive  general 
formulas  for  the  effects  of  these  errors  so  sampling  was 
required  to  develop  an  expected  error.  Our  expectation 
calculation  assumes  at  most  two  pixels  are  incorrect  in  our 
estimated  digital  edge  where  the  length  of  the  edge  is  ten 
pixels.  At  the  time  these  computations  were  performed,  we 
weren't  certain  how  accurately  we  could  find  the  digital 
line,  but  computational  considerations  made  the  examination 
of  additional  incorrect  pixels  rather  expensive. 
The  expected  error  for  this  phase  of  our  study  was 
defined  as  follows.  Assume  a given  digital  edge  has  been 
decided  to  be  the  digital  edge  corresponding  to  a real 


edge.  Since  we  know  the  slope  of  the  underlying  real  edge, 
this  constrains  the  position  of  the  underlying  edge  signi- 
ficantly. In  y-intercept,  slope  space  the  point  corres- 
ponding to  the  real  edge  must  lie  on  a horizontal  line 
whose  height  is  the  known  slope  of  the  line.  We  consider 
as  feasible  digital  edges,  those  which  intersect  this  line 
and  differ  in  no  more  than  two  pixels  from  the  correct 
quadrilateral.  An  expectation  is  now  taken  over  these 
digital  edges . 

We  now  describe  the  above  ideas  more  formally  and 
define  the  expected  error.  Assume  a quadrilateral 
corresponds  to  the  computed  digital  line  L.  Let  the  under- 
lying real  line  in  the  reference  image  have  slope  Let 

, . • . , Q^denote  the  set  of  all  quadrilaterals  such  that  the 
digital  line  corresponding  to  each  Q.,  i=2,...,k,  differs 
from  the  digital  line  corresponding  to  Q^by  no  more  than 
two  pixels  and  such  that  the  line  y=ot-  intersects  Q.  . The 

set 

K 

S=  u Q.r>  L 
1=1  1 

is  connected.  Hence  S is  a line  segment.  Relabel  the  sub- 
scripts on  the  Q 's  sc  that  QA  L is  to  the  left  of  0.(1  L if 

J 

i<1.  Let  ip  denote  the  index  of  the  computed  digital  line. 

The  estimated  position  of  the  real  line  is  the  midpoint, 

(x0  ,CO  of  Q.n  L.  The  error  in  y-intercept  location  if  the 

real  line  was  some  other  point  (x  ,cO  on  S is  just  |x  -x  |* 

1 t>  1 


The  expected  error  is  obtained  by  multiplying  [x-x^l  by  its 
probability  and  integrating  over  S.  Since  the  invariant 
measure  on  lines  in  x-y  space  is  uniform  on  horizontal 
lines  in  y-intercept,  slope  space,  we  need  only  integrate 
1 x-x^  | with  respect  to  the  Euclidean  measure  and  divide  by 
the  length  of  S.  This  calculation  was  done  for  each  of  100 
slopes  between  0 and  1.  The  expected  error  was  0.12  pixels. 
Results  are  given  in  Table  9.1,  columns  1 and  3. 

Section  11 One-dimensional Edge  Projections 

The  fitting  of  a continous  edge  to  a digital  image  is 
a promising  approach  to  subpixel  edge  location.  In  [Ha], 
discrete  orthogonal  polynomials  are  used  to  fit  a continous 
surface  to  an  image  and  the  vanishing  of  the  second  direc- 
tional derivative  is  used  to  locate  edge  points.  Under  the 
assumptions  of  our  current  research,  we  are  using  a high 
resolution  reference  image  with  a straight  line  of  known 
position.  In  addition,  it  is  assumed  that  no  rotation  is 
present  between  sensed  and  reference  images.  This  con- 
strains the  fitting  problem  considerably. 

Two  basic  types  of  fitting  approaches  can  be  applied 
to  this  problem.  First,  we  can  fit  a two-rdimens ional 
surface  to  the  image  near  the  edge  and  then  find  the  best 
fit  of  a straight  edge  with  known  slope  to  the  surface. 
This  problem  is  complicated  by  the  unusual  shape  of  the 
region  near  a straight  edge  in  a digital  image.  Discrete 


orthogonal  polynomial  are  most  easily  fit  to  rectangular 
regions  in  which  the  sides  of  the  rectangle  are  horizontal 
or  vertical.  Square  patches  could  be  fit  to  neighborhoods 
of  pieces  of  the-  edge  neighborhood  and  edge  points 
extracted,  but  the  computational  costs  would  be  much  higher 
than  in  the  second  fitting  approach,  the  one-dimensional 
fit. 

The  one-dimensional  fitting  which  we  adopted  for 
experimentation  uses  heavily  the  fact  that  we  know  both  the 
slope  of  the  edge  and  its  approximate  position.  The 
fitting  procedure  is  quite  simple,  but  the  present  computa- 
tion has  not  been  optimized.  Pixel  centers  for  all 
pixels  near  the  edge  are  projected  onto  a line  perpendi- 
cular to  the  known  dilution  of  the  edge.  To  each  pro- 
jected center  we  associate  the  grey  level  of  the  pixel. 
Thus  we  end  up  with  a possibly  multi-valued  function  on  a 
finite  subset  of  a line.  We  now  fit  an  edge  projection  to 
the  line.  By  performing  the  above  operations  on  a number 
of  digital  edges  without  noise,  we  observed  that  the  one- 
dimensional projection  of  an  ideal  digital  edge  could  be 
represented  by  a continous  curve  consisting  of  two  hori- 
zontal line  segments  connected  by  a slanting  line  segment. 

To  perform  the  fit  on  the  projected  points  we  used  the 
mean  grey  levels  in  the  regions  above  and  below  the  edge  to 
estimate  the  height  of  the  two  horizontal  segment,  for  the 
fitted  1— d edge.  The  only  remaining  quantities  to  be 
estimated  are  the  horizontal  coordinates  of  the  ends  of  the 
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slanted  segment*  We  are  presently  using  exhaustive  search 
to  compute  the  best*  in  the  sense  of  least  squares,  fit  of 
the  three  segment  piecewise  linear  edge  to  the  projected 
data#  Since  the  only  variation  in  the  fitting  of  the  hori- 
zontal pieces  results  from  the  varying  end-points,  the 
square  errors  for  the  two  horizontal  segments,  separately 
with  all  possible  endpoints  are  computed  prior  to  computing 
the  total  error  of  a fitted  piecewise  linear  edge#  This 
greatly  facilitates  the  computation  since  the  square  error 
obtained  in  using  a piece  of  the  horizontal  line  segment  to 
represent  the  data  is  obtained  by  a simple  updating  of  the 
corresponding  calculation  For  shorter  pieces# 

We  now  describe  the  algorithm  more  precisely,  discuss 
theoretical  considerations  affecting  the  performance  of  the 
algorithm  and  give  experimental  results*  We  assume  an 
initial  esimate  of  the  translation  offset  is  known  to 
within  about  a pixel.  In  particular  we  assume  we  have  a 
real  translation  from  the  high  resolution  reference  plane 
to  the  image  plane*  Thus  the  transformation  can  be  used 
to  map  the  reference  line  into  a real  line  in  the  image 
plane.  We  assume  that  each  point  on  this  mapped  line  lies 
within  about  a pixel  of  the  corresponding  current  subpixel 
location  on  the  image  plane* 

The  mapped  real  line  segment  is  now  digitized*  We 
differ  from  our  previous  definition  of  digitization  only 
for  this  section,  and  assume  the  digitization  consists  of 
all  pixels  which  the  line  intersects*  A neighborhood  of 


the  digitization  is  now  grown.  We  defined  our  neighborhood 


to  consist  of  all  pixels  lying  on  the  assumed  digitization 


or  which  were  8-neighbors  of  such  a pixel.  Let  (xj  , yj  ), 


i=l,...,N  denote  the  coordinates  of  the  centers  of  these 
pixels,  and  let  ^denote  the  slope  of  the  reference  line. 
Let  L denote  a line  with  slope  - l ftL  and  going  through 
the  origin.  Then  the  perpendicular  projections  of  the 
(x  . ,y.  ) on  L are  given  by  S = ^X-sinY+y.,  cosY^  , i=l»...N, 
T = arctanui  . We  are  only  specifying  the  coordinate  along 
L.  Note  that  the  points  of  S are  not,  in  general  unique, 
as  can  easily  be  seen  if  the  original  line  is  horizontal 


and  L is  vertical.  Let  a ,...,a  denote  the  points  of  S 

1 N 


in  non-descending  order  (duplications  are  allowed). 
We  now  describe  the  fitting  of  the  piecewise  linear 


curve  to  the  data.  Let  denote  the  mean  grey  level  on 


the  side  of  the  edge  corresponding  to  the  smaller  a.  *s  and 


let  ra*  denote  the  mean  grey  level  on  the  side  of  the  edge 


corresponding  to  the  larger  One  approach  to  the 


estimation  of  these  means  is  to  begin  at  the  midpoint  of 
the  mapped  real  edge  and  move  several  (5  or  6)  pixels  from 
this  point  in  each  direction  away  from  the  mapped  real  edge 
and  use  these  pixels  as  the  centers  for  small  windows  used 
for  estimating  the  region  means.  Due  to  an  experimental 
set  up,  we  avoided  this  issue  but  many  approaches  are 
available.  The  regions  abutting  an  edge  in  the  reference 
image  can  be  roughly  outlined  when  the  reference  edge  is 
first  delineated.  Alternatively,  a window  in  each  abutting 
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region  which  is  not  close  to  the  region  boundary  can  be 

outlined.  The  mapped  versions  of  these  windows  in  the 

sensed  image  can  then  be  used  to  estimate  the  means.  Note 

that  we  are  not  comparing  the  grey  levels  in  the  sensed 

image  with  those  in  the  reference  image. 

Once  the  region  means  m.  and  m have  been  computed,  we 

l l 

can  compute  the  best  edge  fit.  For  each  1 <=  i,  j<°  a, 

with  i < j we  computed  the  merit  of  the  fit  obtained  using 

a.  and  a.  as  the  endpoints  of  the  middle  segment  of  the 
1 J 

fit.  For  each  ( i , j ) as  above  we  define  the  error  e(i,j) 
associated  with  (i,j)  by: 

i j 

e(i,j):  I ( gCa^-m^)*  . + 2 (gC  ) -(map  +b ) ) 

P~i  P-i 

N 

+ £ (g(a^)-m^) 

p=j  + l 


where 

m = (g(a.  ) -g(a.))/(a,  -a.) 

J l J ^ 

and  b = - na-  + gta^  ) 

The  middle  of  the  three  summands  represents  the  quality  of 
fit  for  the  slanted  segment.  We  define  the  optimal  piece- 
wise  linear  fit  to  be  given  by  these  following  segments 

A /V 

(where  (it  j)  minimum  e(i,.i)): 
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1) 

the 

segment 

from 

(aA  , 

m.  ) 
1 

to 

( a*  » 
1 

V 

2) 

the 

segment 

from 

( 3a  , 
1 

V 

to 

ui> 

ra  ) 
Z 

3) 

the 

segment 

from 

(a/j  » 
J 

V 

to 

(v 

V 

We  are  currently  using  a*  + as  the  estimate  of  the 

\ s 

intersection  of  the  real  edge  with  L.  This  estimate 

provides  an  estimate  of  the  translation  offset  between  the 
sensed  and  reference  image  in  the  direction  along  L.  A 
second  estimate  in  another  direction  (preferably  perpendi- 
cular to  L)  is  necessary  to  obtain  an  offset  estimate  for 
the  x and  y translations.  The  procedure  for  and  analysis 
for  combining,  such  estimates  into  a single  estimate  is 
described  in  Section  13. 

Several  factors  offset  the  accuracy  of  the  estimate 

obtained  using  the  above  procedure.  First,  we  must 

consider  the  distribution  of  the  points  a.  on  the  line  L. 

The  fact  that  the  line  segments  begin  and  end  on  a.  1 s 

presents  a limitation  on  the  accuracy  attainable  using  this 

procedure.  The  a.  's  are  of  the  form  x.sinT*  + y,  cosT''  where 

i \ i 

(x,  , y.  ) is  the  center  of  a pixel  near  the  edges.  For 

simplicity,  assume  the  real  edge  goes  through  the  origin. 
Then,  in  order  for  the  point  (x,  , y.  ) to  be  within  a pixel 
of  the  edge  we  must  have  something  like  [ y.  - otx.|<  2.  The 
exact  inequalities  appear  rather  complicated  but  the  above 
approximation  is  based  on  the  fact  that  if  we  look  at  the 
vertical  separation  between  the  edge  and  the  point  (x.  , 
Vj  ),  the  vertical  separation  must  be  less  than  2 if  (xj  , 
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y-(  ) is  to  be  adjacent  to  an  edge  pixel.  Points  (x.  , y(.  ) 

can  actually  be  further  away  vertically  and  still  be  close 
to  the  line  since  we. are  really  interested  in  the  perpendi- 
cular distance  to  the  line.  Thus  the  above  approximation 
becomes  more  accurate,  the  closer  cL  is  to  zero.  We  can 
approximate  the  y.  by  )oi.Xj  + kl  where  -2  <«  k <=  2.  This 
gives  us  the  following  expression  for  the  projection  points: 

x.sinT''"  + y.  [oix  - + k|  cos)'" 

where  -2  <=*  k <=  2 and  0 <=  x <=  hcos^-  where  h is  the  length 
of  the  edge.  While  this  gives  an  explicit  expression  for 
the  projections  of  pixels  close  to  an  edge,  we  have  not 
been  able  to  analyze  the  projections,  even  under  this  more 
restrictive  model,  Number  theoretic  results  related  to 
integer  linear  combinations  of  irrational  numbers  offer 
some  promise  of  shedding  more  light  on  this  problem.  An 
alternative  approach  is  to  compute  the  projection  points 
for  a large  number  of  angles  and  lengths. 

The  exact  manner  in  which  the  distribution  of  gap 
between  a^  's  would  be  used  is  not  entirely  clear  though 
error  bounds  can  be  readily  estimated.  If  we  know  that 
the  maximum  separation  between  any  two  consecutive  a, 's  is 
d,  and  we  further  assume  that  the  algorithm  is  accurate  to 
the  nearest  a.(  then  the  maximum  error  due  to  the  spacing  of 
the  aj's  is  d. 

One  approach  to  improving  the  above  algorithm  might  be 
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to  better  model  the  form  used  to  fit  the  projection 
points.  The  model  consisting  of  two  horizontal  segments 
separated  by  a slanted  segment  was  based  on  the  observation 
that  this  behavior  occured  in  projecting  an  ideal  edge. 
Two  types  of  refinements  could  be  explored.  First,  for  a 
given  pair  of  mean  values  for  the  two  abutting  regions  and 
for  the  given  slope,  one  could  compute  the  best  fitting 
piecewise  linear  segment  for  an  ideal  image  ifith  homo- 
genious  regions  using  the  two  means.  In  this  situation, 
only  the  y-intereept  of  the  real  line  could  be  varied  and 
the  corresponding  variation  in  the  slope  of  the  slanted 
line  segment  could  be  recorded.  Only  dopes  in  that  range 
would  then  be  used  in  fitting  to  the  noisy  image  data.  The 
best  fit,  in  the  least  squares  sense,  as  currently  done  in 
the  algorithm  may  be  better  than  the  best  least  square  fit 
in  this  new  approach  but  it  would  then  represent  an  imposs- 
ible edge  digitization.  Thus  the  proposed  procedure  could 
both  reduce  computation  by  reducing  the  number  of  middle- 
section  end  points  pairs  examined  and  increase  the  accuracy 
of  the  procedure. 

A second  possible  refinement  to  the  current  algorithm 
is  to  find  the  best  estimate  for  the  intersection  of  the 
real  edge  and  L given  the  fitting  projection  edge.  We 
selected  the  mid-point  of  the  mid  section  for  computational 
simplicity,  but  it  may  not  be  optimal  for  all  combinations 
of  slopes  and  mean  grey  levels#  Both  refinements  could  be 
investigated  by  extensive  sampling  but  theoretical  compute- 


tions  would  be  preferable. 

The  effect  of  noise  on  the  above  algorithm  Has  not 
been  modeled.  Since  the  effect  oi  noise  seems  to  be 
strongly  coupled  with  the  geometry  of  projections,  the 
problem  appears  to  be  quite  difficult.  Removal  of  outlying 
grey  values  may  improve  the  signal  to  noise  ratio  but 
disturb  the  geometry. 

Experimentation  was  performed  using  the  above  1-d  edge 
fitting  method.  As  the  initial  results  were  promising, 
more  extensive  experimentation  is  planned.  In  order  to 
perform  the  above  experiments,  it  was  necessary  to  have  an 
image  in  which  the  position  of  an  underlying  real  edge  was 
known  to  very  high  precision*  Two  windows,  an  8x8,  and  a 
4x4,  were  selected  from  two  agr igcul tural  fields  in  a 
Landsat  image  and  each  was  repeated  to  provide  two  * 32x32 
windows,  each  representing  a different  type  of  field,  A 
procedure  was  developed  to  splice  the  two  images  to  form  a 
third  image  with  an  edge  whose  position  is  kno^a  to  very 
high  accuracy.  The  procedure  accepts  as  input  a r line 
with  slope  between  0 and  1 which  hits  opposite  s..c3s  of 
the  32x32  window.  All  pixels  lying  entirely  below  the  edge 
are  taken  from  the  corresponding  positions  in  window  1. 
All  pixels  lying  entirely  above  the  real  line  are  taken 
from  the  corresponding  position  in  window  2,  Each  pixel 
intersecting  the  real  line  is  given  a weighted  average  of 
the  corresponding  pixels  in  the  first  two  windows.  The 
weights  are  simply  the  areas  of  the  parts  of  the  pixel 
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lying  above  and  below  the  real  edge.  A real  edge  length 
of  10  was  used.  Fourteen  angles  at  equal  increments 
between  1 and  25  degrees  were  used.  Table  11.1  gives  the 
magnitude  of  the  error  for  each  line.  The  average  error 
is  .30  pixels. 


Table  11.1  Errors  in  one-dimensional  line  fitting.  Lines 
are  at  angles  varying  between  1 and  25  degrees  in  equal 
increments . 
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Section  12  Fitting  a Digital  Edge  to  an  Image 
This  section  describes  a grey  level  generalization  of 
our  digital  edge  fitting  procedure*  From  the  beginning  of 
our  work  in  the  area  of  subpixel  accuracy,  it  was  felt  that 
grey  level  information  should  ultimately  be  used  in  the 
locating  of  edges  to  subpixel  accuracy  rather  than  merely 
using  the  grey  levels  to  get  the  digital  line  and  then 
using  the  geometric  methods-  The  analysis  of  the  grey 
level  approach  appeared  formidable,  so  we  restricted  our- 


selves  initially  to  the  investigation  of  the  geometric 
methods.  In  Section  11  we  studied  one  means  of  incorporat- 
ing grey-level  information,  namely  by  projecting  grey 
levels  onto  a real  line  perpendicular  to  the  known  edge 
direction.  A piecewise  linear  ideal  edge  was  then  fit  to 
the  data  and  the  offset  in  a direction  perpendicular  to  the 
edge  was  estimated.  We  now  describe  several  subpixel  edge 
procedure  directly  using  the  two-dimensional  image. 

The  basic  idea  of  our  first  fitting  procedure  is  to 
generate  ideal  two-dimensional  edge  images  based  on 
digitizing  various  real  edges  and  find  the  one  which  best 
fits  the  sensed  image.  The  means  for  the  areas  above  and 
below  the  edge  in  the  sensed  image  are  first  computed  using 
the  techniques  outlined  in  Section  11  . Next  the  approxi- 
mate pixel  location  of  the  edge  is  determined.  An  estimate 
of  the  correct  pixel  from  a digital  regis trat ion  procedure 
is  assumed  available.  Without  loss  of  generality,  we  may 
assume  the  lower  left-hand  corner  of  the  pixel  has  coordi- 
nates (0,0).  A real  line  with  the  correct  slope  and  a y- 
intercept  of  0.5  is  used  to  generate  an  edge  image.  Grey 
levels  for  all  pixels  intersecting  this  edge  are  computed. 
The  grey  level  for  an  edge  pixel  is  defined  to  be  the 
weighted  average  of  the  average  of  the  grey  levels  for  the 
regions  above  and  below  the  edge  in  the  sensed  image.  As 
usual,  the  weights  are  the  areas  in  the  mixed  pixel  above 
and  below  the  real  edge. 

The  algorithm  compares  the  generated  digitization  of  an 
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edge  with  the  corresponding  pixels  in  the  sensed  image. 
The  generated  edge  is  in  the  same  coordinate  system  as  the 
sensed  image  so  it  is  meaningful  to  compare  corresponding 
pixels.  For  each  pixel,  we  compute  the  difference  between 
the  sensed  and  generated  grey  level.  The  sum  of  these 
differences  is  used  to  locate  the  edge.  If  the  real  edge 
is  correct  then  we  expect  this  sum  to  be  close  to  zero. 

In  general,  the  sign  of  the  sum  can  be  used  to  guide  the 

search.  From  the  sensed  image  we  know  whether  the  lower 
or  upper  region  has  a higher  average  grey  level.  Without 
loss  of  generality,  we  may  assume  the  upper  region  has  a 
higher  average  grey  level-  If  the  sum  is  a large  positive 
number  then  the  mixed  pixels  are  producing,  on  the  average5 

too  low  a grey  level-  Thus  the  estimated  real  edge  posi- 

tion should  be  shifted  down.  Similarly,  with  a large 
negative  value  for  the  sum,  the  estimated  real  edge  should 
be  shifted  up.  This  procedure  is  carried  out  in  increments 
of  a pixel  until  the  sum  changes  sign-  Upon  termination  we 
have  a refined  estimate  for  the  pixel  location  of  the  edge. 

The  next  phase  of  the  algorithm  attempts  to  locate  the 
edge  to  subpixel  accuracy.  A new  real  edge  is  generated 
with  an  intercept  which  is  the  average  of  the  current 
intercept  and  the  nearest  previous  intercept  in  the  direc- 
tion indicated  by  the  sign  of  the  merit  sum.  As  in  the 

pixel  level  edge  location  method,  the  sum  of  differences  is 
computed  for  corresponding  pixels  and  the  search  is  term- 
inated when  the  possible  change  in  y-intercept  is  less  than 
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a specified  tolerance. 

The  above  procedure  was  carried  out  on  simulated 
imagery  formed  from  LANDSAT  data  as  described  in  Section 
11  . Real  lines  of  slope  ranging  from  0.0  to  1.0  in  incre- 
ments of  0.01  were  used.  The  results  are  given  in  Table 
12.1  . Of  the  100  slopes  tested,  only  5 were  worse  than 
0.2  pixels.  Seventy-seven  percent  of  the  cases  were  0.1 
pixels  or  better. 

A second  procedure  was  based  on  the  idea  of  fitting  a 
digital  edge  to  an  image.  Note  that  in  the  previous  proc- 
edure, the  digitization  of  an  edge,  not  a digitial  edge 
was  used.  Recall  that  the  digitization  of  an  edge  contains 
all  pixels  intersecting  the  edge  while  the  digital  edge 
corresponding  to  a real  edge  contains  the  bottommost  pixel 
in  each  column  of  the  digitization  of  the  edge.  In  the 
second  procedure,  the  set  of  all  digital  edges  which  could 
be  generated  by  an  edge  with  the  specified  real  edge  slope 
were  generated.  For  each  such  digital  edge,  the  quality  of 
fit  of  the  digital  edge  to  the  image  was  computed  and  the 
digital  edge  with  the  best  fit  was  selected. 

The  quality  of  fit  measure  is  a non-negative  real 
valued  function  which  provides  a rough  measure  of  the  edge 
quality  of  a set  of  pixels.  Larger  values  indicate  that 
the  pixels  are  likely  to  lie  on  an  edge.  The  merit  m(S), 
of  a set,  S,  of  pixels  is  defined  to  be 

m(S)=  |uCP)-l(P)|*(l-min(|( low+high )/2-g(P)| ,1) 
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where  the  sum  is  over  all  pixels,  P,  in  the  set,  u(P)  and 
1 < P ) are  the  grey  levels  of  the  pixels  immediately  above 
and  below  P,  g(P)  is  the  grey  level  for  the  pixel  P,  and 
low  and  high  are  the  means  for  the  regions  below  and  above 
the  edge.  This  approach  led  to  subpixel  accuracy  but  the 
results  were  much  poorer  than  for  the  digitization  of  the 
real  edge.  The  results  are  given  in  Table  12.1  . 

The  procedure  using  the  digitization  of  the  edge,  as 
opposed  to  the  digital  edge,  has  the  advantage  that  it  is 
extendible  to  region  with  curved  boundaries,  since  the 
digitization  scheme  can  be  applied  to  any  boundary.  We  are 
not  pursuing  this  in  the  current  study  since  the  investiga- 
tion of  the  straight-edge  method  still  requires  consider- 
able investigation.  We  are  beginning  the  study  of  prob- 
abilistic models  for  straight-edge  error  analysis  using  the 
methods  of  this  section. 

Several  basic  sources  of  error  should  be  considered  in 
our  edge-fitting  procedure.  First,  the  performance  of  the 


method  deteriorates 

with 

an  increase  in 

the  noise 

in 

abutting  regions. 

This 

noise  has  two 

facets . 

It 

result  in  inaccuracies  in  the  calculation  of  the  means  for 
adjacent  areas  and  it  can  result  in  poor  fitting  due  to 
noise  in  the  edge  pixels  themselves.  Another  source  of 
error  is  quantization.  If  the  two  regions  each  had 
constant  gray  levels  and  the  grey  levels  were  not 
quantized,  then  it  is  easy  to  show  the  edge  positions  could 


be  determined  exactly*  As  soon  as  quantization  is  intro- 
duced, the  results  deteriorate,  since  shifts  in  the  under- 
lying edge  position  do  not  necessary  result  in  shifts  in 
the  quantized  grey  levels.  Analysis  of  this  source  of 
error  is  planned  for  future  work. 

The'  edge  fitting  methods  discussed  in  this  section 
represent  an  initial  effort  at  subpixel  edge  estimation 
using  a mixture  of  grey  level  information  and  digital  geo- 
metry. In  the  very  preliminary  experiments  performed  in 
this  section,  it  appears  that  a high  level  of  subpixel 
registration  accuracy  may  be  possible  using  this  basic 
approach.  Considerable  refinement  of  these  methods  is 
possible  by  refinements  of  the  merit  functions  and  search 
procedures  . 


Table  12.1.  Comparison  of  two  subpixel  edge  dectection 
algorithms.  Although  the  number  of  incorrect  pixels  can  be 
large,  the  real  intercept  difference  can  remain  small.  The 
Real  Line  Digitization  proved  to  be  the  most  accurate  of 
the  algorithms,  where  the  directly  estimated  intercept  is 
used . 

Errorl  =>  distance  between  estimated  intercept  and  the 
correct  intercept 

Error2  85  distance  between  average  digital  line  intercept 
and  the  correct  intercept 

Pix  = number  of  pixels  generated  that  are  not  exactly  on 
the  edge  generated  by  the  underlying  real  line. 


Real  Line 

Digitization 

Digital 

Line  Mask 

Slope 

Errorl 

Pix 

Error2 

Pix 

Errorl 

0.00000 

0,09457 

4 

0.09500 

10 

0.61000 

0.01000 

0.05584 

0 

0.28000 

3 

0.17000 

0.02000 

0.07180 

0 

0.17000 

3 

0.25500 

0.03000 

0,05992 

0 

0.06000 

10 

0.38000 

0.04000 

0.05260 

0 

0.05000 

10 

0.32500 

0.05000 

0.04984 

0 

0.16000 

10 

0.19000 
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0.06000 

0.00188 

0 

0.01500 

10 

0.27000 

0.07000 

0.00266 

0 

0.00000 

10 

0. 38000 

0.08000 

0.00055 

0 

0.03500 

10 

0.49000 

0.09000 

0.02109 

0 

0.05000 

10 

0.45000 

0.10000 

0.00465 

0 

0.04500 

10 

0.29000 

0.11000 

0.00090 

0 

0.02000 

10 

0.30000 

0.12000 

0.00984  - 

0 

0.00500 

2 

0.13500 

0.13000 

0.01391 

0 

0.04000 

10 

0.25000 

0.14000 

0.03359 

0 

0.00000 

10 

0.62500 

0.15000 

0.03180 

0 

0.02000 

10 

0.50000 

0.16000 

0.01750 

0 

0.05500 

10 

0.29000 

0.17000 

0.03992 

1 

0.07000 

0 

0.03000 

0.18000 

0.04073 

0 

0.03000 

10 

0.11500 

0.19000 

0.00257 

0 

0.10000 

10 

0.20000 

0.20000 

0.03258 

0 

0.02000 

10 

0.41500 

0.21000 

0.00734 

0 

0.03000 

10 

0.53000 

0.22000 

0.02326 

0 

0.03000 

1 

0.14500 

0.23000 

0.03141 

0 

0.02000 

2 

0.26000 

0.24000 

0.04974 

1 

0.12500 

2 

0.25000 

0.25000 

0.07797 

0 

0.01000 

0 

0.01000 

0.26000 

0.02547 

0 

0.01000 

10 

0.37500 

0.27000 

0.01516 

0 

0.02000 

10 

0.50000 

0.28000 

0.01891 

0 

0.03500 

10 

0.69000 

0.29000 

0.00990 

0 

0.05000 

1 

0.05000 

0.30000 

0.02055 

0 

0.00500 

2 

0.13500 

0.31000 

0.01193 

0 

0.02000 

10 

0.22000 

0.32000 

0.04302 

0 

0.13000 

10 

0.30500 

0.33000 

0.03505 

0 

0. 10000 

10 

0.58000 

0.34000 

0.03555 

0 

0.00000 

4 

0.35000 

0.35000 

0.02758 

0 

0.00000 

5 

0.46000 

0.36000 

0.02807 

0 

0.01500 

4 

0.38500 

0.37000 

0.00984 

0 

0.01000 

10 

0.49000 

0.38000 

0.03297 

0 

0.01500 

10 

0.40000 

0.39000 

0.02355 

0 

0.10000 

10 

0.40000 

0.40000 

0.09234 

1 

0.10500 

10 

0.62500 

0.41000 

0.01427 

0 

0.01000 

3 

0.33000 

0.42000 

0 .03234 

2 

0.09000 

6 

0.51500 

0.43000 

0.00484 

0 

0.00000 

10 

0.46000 

0.44000 

0.00859 

0 

0.00000 

1 

0.10000 

0.45000 

0.00035 

0 

0.02000 

2 

0.18000 

0.46000 

0.10598 

0 

0.06500 

10 

0.55000 

0.47000 

0.00359 

0 

0.04000 

4 

0.34000 

0.48000 

0.03141 

0 

0.14500 

4 

0.42000 

0.49000 

0.01660 

2 

0.25000 

3 

0.25000 

0.50000 

0.04802 

0 

0.14500 

10 

0.36500 

0.51000 

0.04005 

0 

0.04000 

10 

0.48000 

0.52000 

0.04585 

0 

0.06500 

10 

0.59500 

0.53000 

0.03453 

1 

0.06000 

10 

0.71000 

0.54000 

0.03270 

0 

0.00000 

5 

0.45000 

0.55000 

0.04138 

1 

0.10000 

6 

0.54000 

0.56000 

0.02547 

0 

0.05500 

10 

0.63000 

0.57000 

0.05051 

0 

0.01000 

3 

0.25000 

0.58000 

0.01617 

0 

0.01000 

3 

0.28500 

0.59000 

0.03880 

0 

0.10000 

3 

0.40000 

i" 

00000*1 

01 

00002*0 

9 

61990*0 

00066*0 

I 

U 

00206*0 

01 

00062*0 

0 

02611*0 

00086*0 

00018*0 

01 

00082*0 

0 

22660*0 

00026*0 

'» 

00212*0 

01 

00021*0 

0 

29901  *0 

00096*0 

1 

00029*0 

01 

00090*0 

0 

20801*0 

00026*0 

a — 

002  2Z  * 1 

01 

00020*0 

0 

22611*0 

00096*0 

00061*1 

01 

00021*0 

1 

29021*0 

00026*0 

f :' 

00220*1 

6 

00221*0 

2 

02001*0 

00026*0 

* 

00026*0 

01 

00091*0 

2 

02921*0 

00016*0 

00228*  0 

01 

002171*0 

2 

£8801*0 

00006*0 

S /I 

0002i  *0 

01 

00021*0 

1 

£9901*0 

00068*0 

i ' 

l; 

00229*0 

01 

00290*0 

1 

08180*0 

00088*0 

00002*0 

01 

00021*0 

1 

22211  *0 

00028*0 

rr. 

00020*1 

6 

00221*0 

2 

22611*0 

00098*0 

00096*0 

01 

00001*0 

2 

19260*0 

00028*0 

*“*- 

00028*0 

01 

00220*0 

1 

69280*0 

00098*0 

1*  J-5 

00028*0 

01 

00091*0 

2 

22221  *0 

00028*0 

00212*0 

01 

00202*0 

1 

22221*0 

00028*0 

i-- 

00019*0 

01 

00020*0 

1 

29201  *0 

00018*0 

00202*0 

2 

000£0  *0 

0 

£6280*0 

00008*0 

*7 

00008*0 

01 

00001*0 

0 

22221*0 

00062*0 

00299*0 

01 

00280*0 

1 

62220*0 

00082*0 

0002  2 * 0 

01 

00080*0 

1 

£0290*0 

00022*0 

...... 

00029*0 

01 

00280*0 

1 

99290*0 

00092*0 

00082*0 

01 

00001*0 

1 

£8280*0 

00022*0 

00002*0 

9 

00221*0 

2 

29910*0 

00092*0 

00082*0 

9 

00010*0 

0 

19120*0 

00022*0 

•'I  ■ 

00092*0 

£ 

00020*0 

1 

€ 91720  * 0 

00022*0 

U-uK 

00002*0 

01 

00090*0 

1 

28220*0 

00012*0 

00282*0 

01 

00220*0 

0 

26220*0 

00002*0 

JZ  ‘ : 

00029*0 

01 

00020*0 

0 

29200*0 

00069*0 

00212*0 

01 

00200*0 

0 

98900*0 

00089*0 

-11' 

00081*0 

£ 

00020*0 

0 

172220  * 0 

00029*0 

00202*0 

9 

00021*0 

0 

22910*0 

00099*0 

00001*0 

0 

00001*0 

0 

99600*0 

00029*0 

00000*0 

0 

00000  * Q 

0 

829170*0 

00099*0 

00092*0 

01 

00000*0 

0 

29200*0 

00029*0 

,v 

00029*0 

01 

00210*0 

0 

20220*0 

00029*0 

00021*0 

1 

00010*0 

0 

2 2290*0 

00019*0 

00009*0 

9 

00210*0 

0 

20800*0 

00009*0 

993 


Table  12.2*  Summary  of  subpixel  edge  detection  algorithm 
errors  . 

Real  Line  Digitization  Digital  Line  Mash 

Errorl  Pix  Error2  Pix  Error! 

Maximums  0.13352  6 0.50000  10  1.27500 

Averages  0.04741  0.46000  0.07570  7.52000  0.46450 

St.  Dev.  0.03832  0.9210*?  0.08155  3.50280  0.26417 

Section  13  Pair s of  Lines 

The  matching  of  a line  in  a reference  image  with  a line 
in  a sensed  image  only  determines  a linear  relation  between 
the  x and  y offsets  for  the  sensed  image.  A second  linear 
relation  resulting  from  a matching  of  a second  line  between 
reference  and  sensed  images  can  then  be  used  to  get  an 
estimate  for  the  x and  y offsets.  In  this  section  we 
examine  the  offset  estimation  accuracy  resulting  from  this 
approach . 

We  consider  an  image  in  which  two  perpendicular  edges 
are  used  to  estimate  the  offset  between  sensed  and  refer** 
ence  images.  Let  line  L^  in  the  images  lead  to  a linear 
relationship  y=mx+b  between  the  offsets.  Note  that  this  is 
not  the  equation  of  the  real  line  L1but  the  equation  relat- 
ing the  x and  y offsets  resulting  from  trying  to  locate  L 

in  the  reference  image.  If  L*  is  perpendicular  to  L,  then 

A 1 

the  corresponding  relationship  between  the  offsets  for  x 
and  y using  L^s  given  by  y+( l/m)x+bz . The  knowledge  of 
the  correct  digital  lines  for  L and  L give  rise  to  error 

x>  4 


bounds  on  and  b^  . Thus  the  correct  offset  relations  are 
actually  in  the  set 

= f(x,mx+b^  +h  ) | e <h<e^ 

and 

Sz.  = ( l/m)x+b  ^ +k)|e  <k<e  ^ , 

where  e , e , e , and  e are  the  error  bounds  on  the  linear 
i 3 4* 

relationship  between  the  x and  y offset  estimates. 
The  set  of  feasible  offset  estimates  is  the  intersec- 
tion of  the  two  infinite  strips,  and  S^.  The  intersec- 
tion of  these  two  strips  is  a quadrilateral  in  x-offset, 
y-offset  space.  In  the  event  that  the  real  world  edges  are 
perpendicular,  the  resulting  quadrilateral  in  offset  space 
will  be  a rectangle.  The  error  in  the  x and  y estimated 
offsets  is  a function  of  the  angle  between  the  image  edges, 
the  slope  of  the  edge  and  the  error  bounds  on  the  linear 
relationships  between  the  x and  y shifts  resulting  from  the 
individual  edge  matchings.  As  an  idea  of  the  magnitude  of 
the  error,  perpendicular  edges  with  equal  bounds,  say  r,  on 
the  error  in  the  x-y  offset  linear  relation  estimation  and 
with  slopes  1 and  -1  will  have  a maximum  error  of  r 2. 

Keeping  all  parameters  but  the  slooes  fixed,  the  error 
increases  as  the  slopes  move  away  from  1 and  -1. 

t 

Intuitively,  we  are  considering  a square  in  offset  space 
where  the  sides  of  the  square  are  parallel  to  the  edges  in 

the  image.  The  sides  of  the  square  represent  the  error  in 

the  linear  relationship  and  the  horizontal  and  vertical 
extent  of  the  square  give  the  variation  in  the  possible 


correct  x and  y offsets. 

More  detailed  error  analysis  has  been  carried  out  but 
this  analysis  is  not  directly  useful  until  more  extensive 
analysis  of  the  digital  edge  fitting  methods  have  been 
performed.  This  section  ha3  described  a procedure  for 
taking  the  bounds  from  individual  edge  matching  and  pro- 
ducing bounds  on  the  offset  estimation  error  resulting 
from  a pair  of  matching  edges. 


Section  14 


Geometric 


Registration  Summary 


The  previous  fourteen  sections  give  an  overview  of  our 
work  on  geometric  methods  in  registration.  In  the  latter 
sections,  grey  level  information  was  directly  incorporated 
into  the  edge  location  process.  In  this  section,  we 
attempt  to  put  matters  into  perspective. 

Subpixel  edge  position  estimation  can  be  used  for 


registration  and  scene  analysis 


Strictly  geometric 


methods  based  on  the  observation  of  the  correct  digital 
line  can  be  quite  accurate  (averaging  about  1/20  of  a 
pixel).  As  the  number  of  incorrect  pixels  is  allowed  to 
increase  the  estimation  errors,  of  course,  increase*  The 
average  error  over  all  lines,  given  that  the  digitization 
has  at  most  two  incorrect  pixels  is  .118.  To  make  these 
figures  useful,  we  must  know  how  well  we  can  find  the 
correct  digital  line. 

In  the  process  of  developing  methods  for  finding  the 


V*.  ~ 2*' 


correct  digital  line,  we  came  up  with  methods  which  gener- 
alized our  original  algorithms  and  directly  estimated  line 
positions.  In  limited  experimentation,  one  of  these 
methods  resulted  in  an  average  error  of  .047  pixels  with  a 
standard  deviation  of  .038.  This  approach  appears  quite 
promising  though  experimentation  is  in  a very  early  stage. 
This  work  vras  done  on  grey  level  simulated  images.  We  hope 
to  extend  the  analytical  study  of  geometric  registration 
error  to  this  procedure  and  perform  more  comprehensive 
experimental  studies.  If  the  algorithm  continues  to  appear 
promising,  we  will  examine  various  means  to  improve  its 
efficiency  and  reliability. 


Section  15  Random  Fields  and  SubpiKel  Accuracy 
In  the  previous  report  [La]  the  problem  of  subpixel 
translation-registration  was  posed  in  the  context  oE  sensed 
and  reference  random  fields  in  the  plane  for  which  the 

correlation  statistic  (C(*j  defined  below)  forms  approxi- 

mately a Gaussian  random  field.  For  such  sensed  and 
reference  fields,  a theoretical  upper  bound  was  found  for 
the  probability  of  local  misregistration  by  T pixels  or 
more.  In  this  section,  we  summarize  briefly  and  specialize 
the  most  useful  models  and  results  from  that  previous  report 
for  further  comparison  with  empirical  results. 
All  our  models  are  based  on  the  assumption  that  a 

nonrandom  reference  field  Z^(x)  is  specified  at  all  lattice 

z 

coordinates  x = (x4y)  £ hZ^Ci.e,,  integer  multiples  of  the 

fixed  pixel-dimension  h) , and  that  the  sensed  image  Z ^ (x) 

z 

(again  at  all  x £ h 'll  ) has  the  form 
Zs(x)  = Z^(x  + 0 ) + Z^(  x) 

where  ® is  the  unknown  offset  vector,  not  necessarily  in 
h U , which  is  the  object  of  inference  in  registration 
problems;  and  where  Z (•)  is  a strictly  stationary  mean  0 
random  field  which  is  also  assumed  to  satisfy  the  ^-mixing 
condition  of  [De]  mentioned  in  [La]  . 

Further  assumptions  are  required  to  describe  the 


continuous  variation  of  the  fields  Z t Z , Z between  pixel 

K N S 

corners.  First  of  all,  we  assume  (denoting  ( | t j , I t^|  ) by 
I tj  and  {tj  = t - |jt  | ) 

(39)  Zw(t)  - (l-$tj  Xl-ftJ  )Z^(  | 1 1 ) + (1-ft^  ){tjz  (|t| 


&**&&**+  * 


(39) 


+ Sj  > ♦ {tj  > ZM  < ! c I + e ) + f tJ3 

M VH  * V 

where  we  have  defined  units  by  letting  h = 1 and  = (1,0), 
e “ (0,1),  1 = (1,1).  This  assumption  means  that  Z at  a 
point  £ interior  to  a given  pixel  J takes  value  which  is  n 
weighted  average  of  the  values  at  the  corners  of  J with 
weights  proportional  to  the  area  of  overlap  of  a unit  square 
with  lower-left  corner  t with  squares  the  lower-left  corners 
of  which  are  the  four  corners  of  J . In  addition,  we  make 
one  of  two  model  assumptions  on  Z0: 

(40)  with  respect  to  the  given  pixel-lattice,  Z ( • ) 

R 

satisfies  (39) 

(41)  Z^(t)  = Z^  (Itl)  for  t_  - (t^  , t^  ) £ ft 

Assumption  (41)  means  that  we  regard  the  reference-image 
grev-level  as  homogeneous  within  each  pixel. 

Next  suppose  that  based  on  a large  "window**  [-T,  T] 
x [-T,  T]  in  our  plane  coordinates,  we  form  the  "correla- 


tion-statistic" 

T T 

C(t)  ° 1/(4T  ) f f Z (x)Z  (x-t)dx  , tfJR2” 

- -T  -T  R " S 

which  will  have  mean 

T T 

D(t)  = 1/(4TZ)  $ f Z (x+6~t)dx. 

-T  ^ 

Assuming  that  itself,  although  known,  arose  from  a reali- 
zation of  a strictly  stationary  ergodic  random  field,  then 
D(t)  has  a well-defined  limit  as  T gets  large  and  it  follows 
from  work  of  [De]  that  CfO  considered  as  a plane  random 
field  is  approximately  Gaussian.  The  main  result  of  [La] 
was  the  following  (Lemma  3.1  and  Corollary  3.2  specialized 


li 
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question  of  subpixel  estimation  by  bounding  the  accuracy  of 

interpolation  possible  for  D(t),  i.e.  by  describing  the 

features  of  Z (assumed. to  be  a fixed  realization  of  a smooth 

K 

strictly  stationary  random  field)  which  in  the  absence  of  Z^ 
limit  the  accuracy  of  recovery  of  § from  observations  of 
D(  • ) at  pixel  vertices.  Under  a further  regularity  condi- 
tion on  the  stationary  random  field  generating  Z (existence 
of  second  specral  moments),  [La]  found  that  as  T gets  large 
the  error  in  determining  0 by  maximiz ing  the  local  (Taylor 
series)  quadric  approximant  to  D(*)  is  at  most  the  smaller 


O T'  A 1/2.  y A-\ 

^ = h ( ( 1/ 12)  /a  ) min  |secy£,  cscy0j 


k = <12  Zji,yL/*' 

2.  x * h 

where  h is  the  pixel-width  as  before,  L S [T/h]  , 


l '2  L L Z 2 

l , 5(2L  + I)  Z Z KV+V)  Z ( jh , kh)] 

*»  j =-L  k=-L  1 * R 


V Z(x,y)  H Z ( x , y ) -Vz(x-h,  v),  Vz(x,y)^Z(x,y)  - Z(x,  y-h), 

1 11 

and  a^  is  the  smallest  eigenvalue  (with  the  angle  the 
corresponding  eigenvector  makes  with  the  horizontal)  of  the 
quadratic  form 


-a  L 

q(y)  = ( 2L+1 ) Z 


j --L  k=-L 


[ ( y V + y K?  ) Z ( j h , kh ) ] 

1 L 2 i K 


The  size  of  min(K  , K ) might  a priori  be  expected  to  deter- 

1 2 

mine  how  much  more  accurate  than  9 it  is  to  estimate  ® by 
the  maximizer  S of  the  local  least-squares  quadric  surface 
approximant  to  C(')  interpolating  a 3x3  array  of  neighboring 


\Vi. 
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pixel  vertices# 


Section  16#  Objectives  of  the  Simulation  Study 
The  present  simulation  study  had  the  following  major 
object ives : 

i)  to  compile  "emp i r i ca 1 ” results  concerning  perform- 
A LS 

ance  of  Sand  0 on  real  and  simulated  reference  images*  in 
the  form  of  histograms  of  l(9-0il  and  (10  - 011  for  various 

values  of  offset  0, 

A 

ii)  to  compare  the  performance  of  0 chosen  among  pixel 

vertices  to  give  the  largest  value  of  C(-)  with  that  of  the 

aLS 

continuous-valued  estimator  ^ 1 and  to  check  whether  the 
greater  accuracy  can  simply  be  ascribed  to  allowing  J?  to 
take  values  inside  pixel-squares; 

iii)  to  gain  information  on  how  large  the  standard  devi- 
ation of  additive  noise  must  be  compared  to  grey-level 
standard  deviation  in  various  reference  images  before  pixel- 
level  and  subpixel  registration  {estimation  of  6 ) is 
seriously  degraded; 


iv)  to 

check  the 

validity 

and 

use  fulness 

of  the 

theoretical 

results  of 

[La]  for 

35x35 

reference 

images  , 

window  size  T 

=10,  and  T 

o 

■ 5. 

In  the 

remainder 

of  this  Section, 

we  specify  some 

notational  conventions  and  tell  what  exactly  was  computed  in 
the  s imluat ion • To  begin  with,  each  (of  six)  refernece 
image  used  was  s tandardised  to  a 35x35  array  (j,k  - -17,*.*, 


+17)  with  average  0 and  sample  variance  1 (thus 

n n , 

r e z,:(j,k)  = i , 

j <=-n  K=-n 

where  we  have  adopted  pixelwidth  h=l).  The  offset-vector  £ 
for  each  iteration  in  each  simulation  was  chosen  uniformly 
in  0, 1 x 0, 1 . 


The  correlation-statistic  C(»)  was  computed,  for  each 
lattice-point  in  the  square  [-5,5]  , as  follows.  First,  the 
expectation-  term  D(t_)  was  calculated  as  a sum  rather  than 
the  integral  in  its  definition  above: 


, 10  10 

(42)  D(t>  - ( 1 / 2 1 ) L Z Z(ij,k)Z_((.i,k)  + 6 - t ) 

1 =- 1 0 k=- 1 0 5,1  * 


This  modification  was  made  for  two  reasons:  (1)  although 

the  integral  could,  under  either  assumption  (40)  or  (41),  be 

expressed  as  a weighted  sum  of  terras  Z^  (x),  (y ) , the 

weights  would  depend  on  and  it  was  computationally  much 

easier  to  make  use  of  the  equally  plausible  definition 

(42)?  (2)  in  actually  practice,  in  the  absense  of  a 

validated  model  assumption  like  (40)  or  (41),  (42)  is  the 

definition  one  would  use,  with  suras  similarly  replacing 

integrals  in  the  definition  of  C(*)»  In  each  iteration  of 

each  simulation  Z.,(*)  was  simulated  at  lattice  points  (in 

N 

35x35  array)  as 

i . 1 


Vi<  Vj 


w(j,k) 


where 

normally 


is  an 


**}'  i: 

distributed 


array  of  independent  identically 
random  variables 


with  mean  0 and 
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variance  G'  , and  the 

W(j,k)  are 

fixed 

weights 

which 

took  one 

of 

two  forms: 

1/36 

1/9 

1/36 

“ 1/9 

1/4 

1/9 

when 

(40)  was 

as  s tuned  t and 

1/36 

1/9 

1/36 

0 

1/4 

1/4 

W2 

= 0 

1/4 

1/4 

when  (41) 

was 

ass umed  . 

0 

0 

0 

Then  C(t) 

- DU) 

was  calculated 

as 

2 10  10 

(43)  C(t)  - D ( t ) *=  (1/(21)  ) Z £Z»(.j  ,k)Z.(i,k)-t) 

i =-i o k— id1  N 

2 **1,  -1 

In  this  definition  we  have  replaced  (4T  ) by  (21)  and 

modified  some  boundary  terms,  but  (43)  is  otherwise  the 

same  as  in  its  double-integral  definition  if  Z (•)  had  been 

N 

“t. 

made  up  of  independent  N(OfS’  ) variables  at  lattice  points 
and  had  been  interpolated  according  to  (39)  while  Z^  was 
interpolated  according  to  (40)  or  (41).  (For  example,  under 


(40), 

1/4T2  |[zft(x)ZN(x-t)dx  d l/(2jT/.hJ  + l)2  £ Z^(i)  $(4/ 9 ) Z ^(  i-t ) + 

1/9 (Z  ( i-t+e  ) + Z,(i-t-e  ) + Z (i-t+e  )+  Z( i-t-e  ))  + 
W - - -j  ~ z N 2 


X / 3 6 ( Z ( i-t+1) + Z, ( i-t-1 ) + Z , (i-t+e  -e  ) + 

N ~ * M N/“ 1-2 

Z (i-t+e  -e  ))1] 

N - - 2.  J > 

Two  simulation  experiments  were  performed  on  the  DEC 
2060,  one  with  450  iterations  using  weight-matrix  W and 

■M 

the  other  with  250  iterations  using  weights  W . For  each 
iteration,  one  offset  0 and  one  array  ^Z was  generated. 


*5*  -JCA... 


rr 

!!. 
Li  j 


B-*' 


and  for  each  of  six  reference  images  D(t)  and  C(t)  - D(t) 

v i»  ^ 

calculated  according  to  (42)  and  (43)  with  & - 1.  Then  for 
each  of  a number  of  different  values  of  , the  arrays  fn(t) 
+ er(C(t)  - D(t:))|  (correlation-statistic  arrays  corres- 

ponding to  the  noise-fields  ®Z^(»)  generated  from  the  same 

A 

random  numbers)  were  used  to  calculate  estimators  § (the 
lattice-point  t corresponding  to  the  largest  array  element) 
and  2 (the  maximum-point  (x,y)  for  the  least-squares 

/v 

quadric  surface  for  the  nine  array  values  at  ® + (j,k), 

— -1,  0,  1).  In  addition,  a third  estimator  was 
defined  as 

0*  = 8 + .5*  (sign(0L&  - 0 ),  s i g n ( - § )). 

For  each  of  several  values  of  S'  and  each  reference  image  on 
each  iteration,  the  distances  and  \\  9 - 0f|. 

were  recorded.  Output  for  the  simulations  consisted  of 
histograms  of  these  distances,  with  bin-width  .1  pixel  for 
simulation  i and  ,125  pixel  for  simulation  2.  The  outputs 
are  tabulated  and  interpreted  in  Section  18  and  19.  Section 


17  describes  the  six  reference  images  (three  artificial, 

three  real)  along  with  the  corresponding  quantities  1 , H , 

X 

K.  , K relevant  for  the  theoretical  predictions  of  Section 
A Z 


grey-level  arrays  chosen  more  or  less  arbitrarily  from  an 


80x125  LAHDSAT  image  of  a rural  area  in  the  United  States 
including  cultivated  'fields,  some  wooded  areas,  and  some 
roads.  The  other  three  images  were  artificially  construct- 
ed, as  follows; 


For 

image  1, 

ZR(.l,k)  = 

55.0 

- 1. 

5*(|  j 1 

| + | k | ) , j , k=-17 ,- 

16, 

• % • 

* + 17; 

For 

image 

2, 

f0 

if 

max( | j | , ! k | ) >= 

3 

Z^ 

" i 

| 40 

if 

max( 1 j 1 , 1 k f ) < = 

2 

For 

image 

6, 

/ 20 

if 

max(j,  k)  <= 

0 

2 ^ (i*k) 

= 

( 10 

if 

max( j , k)  > 

0 

Table  17.1  contains  the  values  of  H , K , and  K for 

T 1 2 


the  six  reference  images.  We  recall  that  these  quantities 

depend  only  on  the  reference  image  and  not  on  the  noise- 

covariances.  On  the  other  hand,  T"1  and  do  depend  on 

the  covariances  of  Z.,(»);  their  values  are  displayed  in 

! N 

Table  17.2  for  all  six  reference  images  under  the  assumption 
(40)  with  weight-matrix  and  Z^  ~|j(0,l). 


.ieSfcr 


TABLE  17.1 


(A)  T 

vs  . H 

T 

for  six 

reference  images 

Image 

1 

2 

3 

4 

5 

6 

? 

ht 

.7 

.014 

.212 

.197 

.187 

.244 

.034 

1 .4 

.027 

.382 

.461 

.401  . 

.434 

.069 

2.1 

.062 

.551 

.662 

.518 

.530 

.103 

2.8 

.098 

.636 

.662 

.518 

,608 

.103 

3.5 

.150 

.806 

.669 

.580 

.643 

. 137 

4.2 

.202 

.890 

.669 

.590 

.678 

.172 

4.9 

.235 

.975 

.669 

.590 

.678 

.172 

5 . 6 

.329 

1.0 

.720 

1.0 

.7  34 

.322 

6.3 

.399 

1.0 

.850 

1.0 

.770 

.372 

7.0 

.469 

1.0 

.895 

1.0 

.866 

.422 

(B)  K 

1 

and  K 

l 

for  reference  images 

Image 

1 

2 

3 

4 

5 

6 

> 

.183 

.816 

1.26 

1.30 

1.54 

.60 

s 

5.48 

1.2  2 

.813 

.784 

.747 

1.66 

TABLE  17 

.2.  r, 

^(1),  and 

]£(  2) 

values 

Image 

r 

i—l 

^(1. 4 

1 

.0453 

.0075 

.0105 

2 

.0615 

.0207 

.0284 

3 

.0568 

.028 

.034 

4 

.0603 

.030 

.035 

5 

.0581 

.025 

.029 

6 

.0536 

.016 

.022 

TABLE  17.3.  Smallest  *t  (in  multiples  of  ."7)  for  which 
x (T ) >=  4.5,  for  six  reference  images  and  four  values  of£f. 


€f  » 


Image 

1 

2 

3 

4 

5 

6 


1 


4.9 

1.4 

1.4 

1.4 

1.4 

5.6 


We  have  given  values 

for  0 


•5  .25  .125 

2.1 

.7 

.7 
.7 
.7 
1.4 

for  f(D  and  ^(1.414).  Actually,  only 
<=  <=  l/3  are  relevant  to  estimating 


3. 5 

.7 

.7 

.7 

.7 

4.2 


2.1 

.7 

.7 

.7 

.7 

2.1 


tfSjiF 


values  of 
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x(T  ).  For  purposes  of  approximate  calculation,  we  treat 

* ) as  being  linear  on  [0,1],  in  which  case  2 (\/T  - 1)  /l/' 

' l 1 

(3  )du  < .75J^(1).  In  further  calculations,  we  therefore 

replace  xCV)  by 

x*(T)  »T/(r  + .75^(1)) 

Now  according  to  the  Proposition  of  Section  1,  with  = 5, 

T =10, 


P[|f  & - 0 II  >=  T ] <=  1 . 35  ( 


-(AtD/z 


/At))  ioM 


The  right-hand  side  of  this  inequality  is  approximately  1.1 


for  x 


4 .12  for  x = 4. 5 and. 01  for  x =5.  Thus  we 


have  tabulated  (in  Table  17.3)  for  all  six,  reference  images, 
the  smallest  (in  multiples  of  .7  pixels)  f ors'-u]oich  x*  (T) 
>=  4.5.  Note  that  reducing  £ by  a factor  1/2  does  not 


change  H but  multiplies  bothr'andj^'by  1/2,  so  that  x * is' 
inversely  proportional  to  S'.  Also  note  that  by  our  defin- 
itions 'fc(t)  - D(t)  = t.  4 hZf,  i = lj2  | is  a strictly  station- 
ary Gaussian  random  field  not  depending  on 


Section  18,  Simulation  Results 
The  histograms  produced  for  II©  “Oil  H § - © U / and  II © -#H, 

according  to  the  simulation  design  described  in  Section  16, 
are  tabulated  in  a slightly  different  form  in  Tables  18.1 

"X 

and  18.2.  \ 

Defining  empirical  distribution  f unc t ions  .for  each  Simula- 

\ 

tion  by  \ 


F(x) 


(^iterations  for  which  (|  © " © II  <=  x)/(#itera- 


I ■ 


581 


IU 


} 

L; 


r 

L-* 


r?' 

i( 

u - 


ij: 

ll 


r 


F 


(.S 


<x) 


u&  tions) 

(^iterations  for  which  Il8  ~ <=  x)/(#itera- 

t ions ) , 


we  display  for  each  simulation  and  each  reference  image,  for 
three  selected  values  of  the  values  F(x),  FLS  (x)  for 
.1  <=  x <=  1.7  in  increments  of  .1  in  simulation  1,  and 
.125  <=  x <=  1.5  in  increments  of  .125  in  simulation  2.  We 

V 

have  not  tabulated  the  results  with  the  artificial  estimator 

ft*  ' LS 

" , which  was  introduced  to  Me  if  & derived  its  accuracv 

* 

simply  by  allowing  values  inside  pixels,  because  2 turned 

LS  S' 

out  to  be  so  conclusively  inferior  to  £ and  to  & . To  show 
this  vividly,  we  consider  Table  18.3,  in  which  are  displayed 


I 


r 


* 


t 

■ 


the  empirical  upper  quartile  points  (75th  percentiles)  of 
110  “011  , |!0  -0H  , and  H for  each  reference  image 
and  each  of  three  values  of  & . These  were  calculated  by 
linearly  interpolating  the  empirical  distribution  functions 
from  simulation  1 to  find  the  x corresponding  to  distribu- 
tion function  value  .75. 

A 

TABLE  18.1.  Empirical  distribution  functions  F and 
F ^ (in  parentheses)  for  each  of  six  reference  images  and 
three  values  of  e-,  from  simulation  1 (450  iterations) 
(18.1a)  Image  1 


- .10 

O 

C*1 

• 

.30 

.1 

.016  (.038) 

.007 

(.009) 

.004 

(.004) 

.2 

.091  (.173) 

.027 

(.036) 

.013 

(.016) 

.3 

.207  (.278) 

.067 

(.091) 

.038 

(.036) 

.4 

.318  ( .464) 

.107 

(.131) 

.058 

(.064) 

.5 

.429  (.573) 

.173 

( .200) 

.102 

(.098) 

. 6 

.560  (.711) 

.224 

(.278) 

.127 

(.131) 

.7 

.660  (.802) 

.267 

(.351) 

.169 

(.178) 

.8 

.724  (.884) 

.336 

(.431) 

.193 

(.236) 

.9 

.800  (.938) 

.400 

(.500) 

.244 

(.291) 

f 


1 


i 


.458  (.569) 
,540  (.609) 
,622  (.678) 
,664  (.740) 
,724  (.804) 
,776  (.849) 
,824  (.880) 
.856  (.907) 


.289  (.311) 
.349  (.393) 
.400  (.442) 
.440  (.480) 
.484  (.522) 
.527  (.567) 
.582  (.616) 
.616  ( . 66 ) 


(x)  in  parentheses) 


(F  (x)  in  parentheses) 


.029  (.171) 
.142  ( .524) 
.320  (.802) 
.509  (.911) 
.733  (.969) 
.889  (.989) 
.962  (.996) 
.987  (.998) 
1.0  (1.0) 


.029 
.142 
.311 
.480 
.6  76 
.827 
.911 
.958 
.982 
.987 
.991 
.991 


3 

.991 

( *987) 

4 

.991 

( .987) 

5 

.991 

( .987) 

6 

.991 

(.987) 

7 

.993 

(.989) 

(18. Id)  Image  4 


(F  (x)  in  parentheses) 


.1 

.029 

(.180) 

.029 

(.140) 

.029 

( . 067 ) 

.2 

.142 

(.556) 

.142 

( .400) 

.140 

(.276) 

.3 

.322 

(.869) 

.318 

( .658) 

.302 

(.484) 

. 4 

.527 

(.980) 

.513 

(.811) 

.478 

(.638) 

.5 

.789 

(.996) 

.733 

(.924) 

.651 

( .751) 

. 6 

.933 

(1.0) 

.873 

(.971) 

.800 

( .847) 

.7 

.980 

.931 

( .987) 

.871 

(.911) 

.8 

.996 

.969 

(.989) 

.924 

(.938) 

.9 

1.0 

.991 

(.993) 

.956 

(.960) 

1.0 

.996 

( .993) 

.969 

( .971) 

1.1 

.996 

(.993) 

.973 

(.976) 

1.2 

.996 

(.993) 

.976 

( . 976) 

1.3 

.996 

( .998) 

.978 

( .980) 

1 . 4 

.998 

( .998) 

.982 

(.980) 

1 . 5 

.998 

( .998) 

.987 

(.982) 

1 . 6 

.998 

(.998) 

.989 

(.989) 

1.7 

.998 

(1.0) 

.991 

( .989) 

(18. le)  Image  5 


(F  (x)  in  parentheses) 


.1 

.029 

(.318) 

.029 

(.127) 

.029 

(.064) 

.2 

.142 

(.798) 

.142 

(.418) 

.138 

( .229) 

.3 

.322 

(.964) 

.318 

(.682) 

.300 

( .424) 

. 4 

.527 

(.998) 

.509 

(.847) 

.458 

(.611) 

.5 

.791 

(1.0) 

.742 

( .927) 

.660 

( .731) 

.6 

.933 

.876 

(.971) 

.784 

( .807) 

. 7 

.991 

.978 

( .982) 

.882 

( .864) 

.8 

1.0 

.991 

(.993) 

.916 

(.902) 

.9 

.998 

( .996) 

.951 

(.924) 

1 . 0 

.998 

( .996) 

.962 

( .940) 

1.1 

.998 

(1.0) 

.962 

(.956) 

1.2 

.998 

.9  64 

(.967) 

1.3 

.998 

.971 

(.969) 

1.4 

.998 

.976 

(.978) 

1 . 5 

.998 

.976 

( .978) 

■tV! 

in:*  . 


1 a # 


584  i: 

. ti 


1.6  .998 

1.7  ,998 


.978  (.980) 
.980  (.980) 


} 

tj 


(18. If)  Image  6 F(x) 


(x)  in  parentheses) 


• :J 


S' 

.4 

.8 

1.2 

— 

X = 

.1 

.029 

(0.  ) 

.029 

( .016) 

.024 

(.013) 

: -J 

.2 

.142 

( .018) 

.127 

(.073) 

.102 

(.102) 

.3 

.307 

( . 120) 

.264 

(.193) 

.220 

(.198) 

i 

.4 

.447 

(.369) 

.396 

(.360) 

.331 

( .333) 

.5 

.604 

(.631) 

.562 

(.516) 

.467 

(.438) 

. 6 

.720 

(.864) 

.673 

(.704) 

.573 

(.567) 

.7 

.856 

( .978) 

.784 

( .829) 

.664 

( .664) 

.8 

.942 

( .998) 

.878 

(.911) 

.749 

( .740) 

.9 

.980 

(1.0) 

.931 

(.944) 

.804 

( .791) 

r*?l 

1.0 

.998 

.964 

(.964) 

.836 

( .824) 

1 

1.1 

1.0 

.976 

(.971) 

.856 

(.856) 

: J 

1.2 

.978 

(.980) 

.860 

(.871) 

1.3 

.978 

(.982) 

.867 

(.884) 

1 

1.4 

.982 

(.982) 

.871 

(.887) 

J 

1.5 

.984 

(.987) 

.884 

(.898) 

1.6 

.984 

(.987) 

.889 

(.907) 

1.7 

( .987K.987) 

.893 

(.918) 

- 
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Table 

18.2. 

Empirical 

d . £ . ' s F and  F 

(in  parentheses)  for 

each  of  six 

re  ference 

images  and  two 

values  of 

from 

simulation 

2 (250  iterations). 

i'.  j 

;ji 

(I8*2ab)  Images  1 and 

2 

l'“  t 
[ 

Image  1 

Image 

2 

’ r 

x= 

6 - 

.1 

.3 

.2 

.6 

■ 1 • 

U J ■ 

.125 

.028 ( .080) 

.000( .008) 

• 032( . 308) 

.024( .028) 

; 

.25 

. 092C.256) 

.008 (.036) 

.160C.796) 

- 1 1 2 ( .164) 

l 

.375 

. 248( .424) 

.052( .088) 

. 448 ( . 944 ) 

. 280( . 296) 

*■'  f* 

.5 

. 396( o 604) 

•096( . 108) 

. 748 ( . 992 ) 

.428 ( .460 ) 

.625 

•572( .728) 

. 136 ( .152) 

.916(1.0) 

.536( .596) 

1 ( 

.75 

. 700( .832) 

• 1 76 ( . 200 ) 

.988 

•660(.672) 

l 

j a 

.875 

. 788 ( . 908 ) 

. 216 ( . 248 ) 

1.0 

•724( .712) 

1.0 

. 840( .972) 

. 264( . 328 ) 

- 756 ( .752) 

■ ! 

1.125 

.908( .988) 

.324(.388) 

.780( . 796) 

t.  it  ■; 

1.25 

,952( .992) 

.42  (.46) 

.796(  .820) 

5 

1.375 

. 984( . 996 ) 

,508( .512) 

. 8 1 6 { . 844) 

r;  j 

1.5 

•996(.996) 

• 556( » 564) 

.836( .852) 

' i 

^ i j 

(18.2 

cdef ) 

Images  3* 

4*5,6 

1 : 

Image  3 

Image 

i 4 

■ R 

f-ty 

I 


x® 


r 

.125 

. 032( .412) 

.032( . 128) 

.032( . 128) 

.032C.12) 

i 

.25 

.16  (.900) 

. X 52 ( .392) 

.16  (.736) 

. 152( . 332) 

.375 

.444C.992) 

.412( .656) 

.448( .952) 

.416( .556) 

I 

. 5 

.748(1.0) 

. 644( .816) 

. 7 76 ( .996) 

.676(.732) 

I 

.625 

.932 

.852( .904) 

.940(1.0) 

. 8 1 2{ . 832  ) 

.75 

.996 

• 936(  .956) 

1.0 

.936( .876) 

T 

.875 

1.0 

. 968 ( . 964) 

. 968( .92) 

1 

1. 

. 984 ( . 984) 

.98  (.944) 

1.125 

.992( .988) 

. 984( . 956) 

1.25 

.996(.988) 

.988 ( , 968) 

d 

1.5 

•996(.992) 

.988( .976) 

Itnace 

5 

Image 

6 

T 

f 

6 - 

v*  “ 

.4 

1.2 

.12 

.36 

Li: 

.125 

.032( .452) 

.032(.096) 

.032(0.  ) 

. 028 ( . 024 ) 

r“i 

.25 

.16  (.90) 

.148(  .288) 

.156( .044) 

. 1 24( . 088) 

■ . 

.375 

.444( .996) 

.388(  .512) 

.324( .204) 

.292( .224) 

Lj 

.5 

.764(1.0) 

.628( .676) 

.488(.576) 

.432C.42) 

.675 

.952 

.792(.804) 

-672( .876) 

. 568 ( * 564) 

.75 

.996 

.884(  .884) 

.86( .98) 

.70( .692) 

v_.: 

.875 

1.0 

.908(  .896) 

.968( .996) 

. 78( • 74) 

1. 

.92( .908) 

.996(1.0) 

.82( .816) 

r™s 

1.125 

.928C.92) 

1.0 

. 84( . 86) 

T; 

1.25 

.94( .936) 

. 8 5 6 C .892) 

1.5 

.948(.96) 

•888( .916) 
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TaWe  18*3.  Triples  of  empirical  75th  percentile  values  for 
(II  ? - , !l0  - 011  , HS*-  ) from  simulation.  1 (450  itera- 


tions  ) , 
of  6, 

for  each 

reference 

image  and 

each  of 

three  values 

Image  1 

<5 

Image  2 

€ 

Image  6 

.10  (. 

83, .64, .80) 

.2 

(. 

50, .24, .58) 

.12 

( .62, .55, .90) 

.20  (1 

.45,1.32,1. 

31)  .4 

(. 

58, .46, .73) 

.24 

(.67, .64, .97) 

586 


a. 

. 1 

V 


.30 

(2.1 

*1.86,1.94) 

.6 

( .86, .8* 1.02) 

.36 

( .8, .82,  l .1) 

er 

(.49 

Image  3 

6 

Image  4 

GT 

Image  5 

.4 

, .19, .53) 

.4 

(.49, .26, .59) 

.4 

(.48, .19*. 55) 

.8 

(.51 

, .28, .60) 

.8 

(.51,  .36, .66) 

.8 

(.50*. 44*. 64) 

1.2 

(.55 

, .41, .68) 

1.2 

(.  57, .50*. 75) 

1.2 

( .57 , .52, .80) 

To  complete  this  Section*  we  now  discuss  the  accuracy  of 
the  empirically  estimated  numbers  in  Tables  18.1-18.3.  All 
the  distribution  functions  values  p are  with  approximate 


probability  contained  in  the  symmetric  interval  of  length 
■ -1. 


p( 1-p)  | ( 1-  oc/2)\/n  around  the  empirically  estimated  values 
where  J is  the  standard  normal  distribution  function  and  n 
is  the  number  of  iterations  in  the  simulation.  With  n=450, 
substituting  1/2  for  p,  we  find  the  conservative  (1-  cL)- 
quantiles  for  each  t: 


.019  ot  £.  10 


percentage  points  for  ! (t)-F(t)|=  ^.023  c£<.05 


.026  c£< . 02 

In  order  to  take  account  of  our  having  estimated  d.f. -values 


F(t)  by  empirical  estimates  F (t)  for  many  t s Lmul taneous- 

est 


ly»  the  Kolmogorof f-Smirnof f approximate  percentage  points 
for  n=450  are  relevant: 


percentage  points  for  sup|F  (t)-F(t)|= 

^ <?St 


.058  « =.10 
.064  oi  =.  05 
.077  06*. 01. 


Finally*  in  Table  6 we  have  empirically  estimated  upper 

A 

quartiles  for  random  variables  like  I!®  - ® ll  . Although  it’s 
hard  to  assess  the  accuracy  of  the  linear  interpolation  we 
have  used,  the  ordinary  binomial-normal  confidence  interval 
(with  n=450)  for  any  t near  the  upper  quartile  of  F ( * ) (with 
F(t)  near  3/4)  yields  F(t)  with  98%  probability  in  the  range 


F (t)*.Q2.  Therefore  we  can  ascribe  extremely  high  confi- 
<?St 


t-  3 
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dence  to  the  first  decimal  place  of  the  upper-ouartile 

* 

estimates,  and  if  P(  ) (e.g.  the  d.f.  of  II®  - ®ll)  were 
approximately  linear  within  increments  of  .1  for  x between  0 
and  1.7,  we  could  have  approximately  98%  confidence  that  the 
error  in  upper  quartile  estimates  would  be  at  most  1 .02. 
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Section  19  Interpretation  of  Results:  Conclusions 

Our  first  and  immediate  conclusion  from  comparing  Table 

17.1(B)  with  Table  18.1  is  that  the  f igures-o f-merit  and 

Kj,  far  subpixel  estimation  are  on  the  one  hand  too  crude  to 

be  of  use,  since  the  order  of  subpixel  estimation  they  allow 

without  noise  is  .8  pixel  or  worse  for  images  2-5,  and  on 

the  other  hand  not  at  all  predictive  of  either  the  size  of 

||6-0||  or  the  improvement  of  ||8^-8||  over  I | 9-  0 | | . 

Although  one  could  hope  to  refine  these  f igures-of-raerit  by 

estimation  of  further  spectral  moments  of  £_(•),  the  payoff 

would  seem  to  be  much  too  small  for  the  stringency  of 

assumptions  on  reference  images  which  one  would  have  to 

impose.  We  therefore  do  not  recommend  the  use  of  such 

noise-free  figures  of  merit  for  subpixel  registration. 

The  second  obvious  conclusion  of  our  study  has  already 

been  mentioned  but  should  be  expanded:  not  only  do  the 

simulation  results  in  Tables  18.1-18,3  establish  the  super- 

_tS 

iority  of  over  £ in  estimating  but  the  artificial 
estimator  0 (which  attempts  to  bridge  the  gap  between  £ and 

t&  B bL 5 

6 by  shifting  £ to  the  center  nearest  £ of  a pixel  with 


i 

i 


i 

i 


r. 


vertex  j?  ) is  markedly  worse  than  both!  In  other  words,  for 
the  types  of  moving-average  Gaussian  noise  fields  studied, 

■£  ts 

the  subpixel  improvement  of  v by v makes  v the  estimator 

of  choice  for  § (in  the  absence  of  more  detailed  geometric 

information  about  Z ), 

R 

Some  quantitative  discussion  of  the  simulation  results 
will  give  a sharper  focus  to  our  conclusions.  Considering 
Table  6 first,  we  see  that  the  accuracy  of  j?  is  relatively 
insensitive  to  the  noise-level  parameter  & for  the  real 

A 

reference  images  (3-5)  and  that  ll<}-0|!  is  less  than  ,5 
pixel,  for  S'  between  .4  and  1.2,  roughly  75%  of  the  time. 
For  these  images,  I | has  upper-quar  ti  le  ranging  from 

.2  to  .5: pixels  as  & ranges  from  .4  to  1,2,  and  the  advan- 
tage  of  3 over  S deteriorates  as  & gets  larger  than  1.0. 
Indeed,  Tables  18.1  and  18.2  strongly  support  the  following 


generalization: 

for  many 

images  2,3,4, 

and 

5, 

when 

1!  $ -oil  is 

less  than 

about  .6  pixel,  || 

0*  - 

6 

1 t is 

(stochastically) 

smaller 

than  M $- ©|  I by  .1 

pixel 

or 

more 

for  small  (but  this  advantage  is  diluted  by  larger  S'  ). 

Quite  generally,  for  all  six  images,  there  seems  to  be  no 

advantage  of  over  <9  when  l|8**Q||  is  ,9  pixel  or  more. 

Images  1 and  6 (both  artificial,  with  strong  geometric 

structure,  and  quite  nonstationary)  are  special  in  (i) 

tS  . 

showing  very  little  advantage  for  over  p , except  for  the 
smallest  value  of  and  (ii)  showing  very  rapid  loss  of 
accuracy  as  increases  (e.g.,  the  upper  quartiles  in  Table 
18.3  for  I | -®|  | are  larger  for  Images  1 and  6 than  for 


the  other  images,  with  S’  only  half  as  large  or  less). 

The  last  topic  requiring  detailed  comment  is  the  compar- 
ison of  predictions  in  Table  17.3  with  empirical  results  in 
Tables  18.1  and  18.5.  The  special  features  of  Imagesl  and  6 
are,  if  anything  more  sharply  brought  out  in  Table  17.3  than 
in  the  empirical  results,  reflecting  in  part  the  conserv- 
atism of  the  probability  inequality  of  Section  15.  However, 
the  theoretical  inequality  together  with  Table  17.3  very 
satisfactorily  shows  the  subpixel  accuracy  of  registration 
attainable  on  images  2-5.  This,  of  course,  is  borne  out 
strongly  both  in  simulation  1 ( to  which  Tables  17.1(a), 
17.2,  and  17.3  are  directly  relevant  and  in  simulation  2. 

Summary 

According  both  to  theoretical  inequalities  and  the 
simulation  study  reported  here,  automatic  subpixel  registra- 
tion with  respect  to  real  grey-level  reference  images 
(assumed  to  be  observed  translated,  with  a stationary  noise 
field  added  to  the  pixel  grey-levels)  seems  quite  feasible. 
The  present  simulation  study,  one  of  the  first  systematic 
performance  evaluations  of  the  maximum-correlation  method  of 
image-registration  and  of  a known  effective  variant  based  on 
maximizing  a least-squares  quadric  surface  locally  approxi- 
mating the  (discrete)  correlation-statistic  near  its 
(discrete)  maximum,  shows  that  even  if  the  additive  noise 
has  standard  deviation  as  large  as  that  of  the  (35x35) 
reference  image,  the  upper  qauartile  of  the  error  in 
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Section  20*  Conclusions 

We  have  developed  geometric  and  probabilistic  models 
for  subpixel  accuracy' in  image  registration  and  edge  loca- 
tion.  These  models  have  been  used  to  develop  and  analyze 

procedures  to  perform  these  tasks.  Initial  experiments 

> 

indicate  a high  level  of  subpixel  accuracy,  may  be  attain- 

\ 

able  with  the  grey  level  geometric  methods,  though  consider- 
able experimentation  will  be  required  to  validate  this* 
Our  analysis  of  digital  lines  methods  is  reasonably 
complete  and  indicates  an  average  error  of  about  1/20 
pixel.  This  result,  which  was  based  on  restrictive 
assumptions,  led  to  direct  edge  estimation  procedures  using 
the  digitization  of  an  edge  (including  grey  levels).  This 
method,  which  was  briefly  tested  on  grey-level  imagery 
formed  from  Landsat  data,  gave  similar  accuracy  without 

relying  on  the  restrictions  of  the  strictly  geometric 

\ 

method. 

An  estimate  for  determining  the  error  in  using  the 
peak  of  the  cross-correlation  between  sensed  and  reference 
images  as  an  estimate  of  the  offset  was  developed.  Simula- 
tions were  used  to  determine  the  reliability  of  the  error 
estimate  and  to  determine  the  errors  resulting  from  inter- 
polation of  the  correlation  function  to  locate  a subpixel 
peak.  The  level  of  subpixel  accuracy  as  a function  of  the 
signal  noise  was  analyzed  using  simulations. 

The  primary  direction  for  future  work  will  be  the 
analysis  and  testing  of  the  procedure  for  estimating  real 


edge  location  using  the  artificial  edge  digitizations  as 
masks*  Reasonable  directions  of  research  include  further 
testing  on  noise  imagery,  computation  optimization,  and 
extension  of  our  previous  analysis  of  geometric  registra- 
tion to  incorporate  random  grey  level  noise  for  the 
analysis  of  geometric  registration  to  incorporate  random 
grey  level  noise  for  the  analysis  of  this  new  procedure. 
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